Hacker News new | past | comments | ask | show | jobs | submit login
The Debugging Mindset (acm.org)
246 points by jodooshi on April 3, 2017 | hide | past | favorite | 100 comments

The scientific-method-like "general approach to debugging" that the author describes in the last third of the paper works, but it's pretty inefficient. The number of possible hypotheses that may cause an observed bug is very large; if you test each individually, you may have to go through many permutations, and you're not systematically gathering data that may help you narrow the search space.

Rather, I've found "divide and conquer" debugging to be much more effective. You should know (based on your domain knowledge of the program) the rough sequence of operations between the input and output of your code; if you don't, try single-stepping through it or adding log statements. Pick a point approximately halfway across that computation, and inspect (via debugger, println, or unit tests) the state of the program at that point. Does it match your mental model? If so, the bug is between there and the output; repeat the process on the latter half of the code. If not, the bug is between the input and your intermediate point; repeat the process on this subset. As an added benefit, you can often find and fix other, unrelated bugs that may not be symptomatic yet but are silently waiting to trigger strange results.

This process is O(log N) in the size of the code path, while the "formulate hypothesis and test it" approach is potentially combinatoric if the code interacts with itself.

Also, I observed that as you get more and more experienced, intuition ("hunch") works surprisingly well. It's not always reliable and sometimes it's just plain wrong though.

From the article: "Intuition can be an effective strategy for debugging but requires extensive experience and, when used as the only strategy, leaves the programmer unprepared to handle new, unfamiliar bugs."

On the other hand, I don't see how you can keep experience and intuition from being a factor in any instance of the scientific method.

It sounds like a moot point. It's not possible to acquire the extensive experience necessary to have intuition without having another strategy in the first place. Intuition is the long-term consequence of employing a strategy: no longer having to step through the same sequence every time.

Exactly this.

Indeed intuition is needed. When John Nash, a Mathematician with schizophrenia, was asked why he believed such irrational things when he such a brilliant man, he responded along the lines of "Those irrational ideas came to me the same exact way my rational, ground breaking ideas came to me." So intuition will always be needed, we just need to remember to check it.

>> It's not always reliable and sometimes it's just plain wrong though.

Yep, and in my experience this can result in wasting huge amounts of time. I've lost count of the number of times I've followed my intuition down some rabbit hole, and several hours later found that the issue was something totally unrelated. This usually happens when I've been coding for a while and am tired. :)

Well, (well developed) intuition is on average a net win. For any given problem, it can let you down badly.

I try to do something like "follow intuition, but with a timer started". If following the intuition fails to bear fruit by the time it "should", I start to question if I'm on the right track.

The "divide-and-conquer" paradigm extends to selection of input data as well: A few minutes testing known input sets, starting with the simplest, can save hours with the debugger. If none of the simpler input sets at hand repros the failure, it may be profitable to attempt to craft one.

Now with a minimal failure case in hand, proceed with temporal "divide-and-conquer" along the lines you've so nicely laid out.

This is a surprisingly accurate description of the general sort of debugging technique I've gotten better at as I get better at programming. I find that logically it tends to make more sense to me to start with a high-level function, and step over its component function calls/loops one by one, inspecting the program state and diving into any that seem to be producing strange results. But overall, I definitely just find it's a lot more effective to start from a broad 'does everything look okay at this point?' than from a narrow 'I think the problem is this'.

> 'does everything look okay at this point?' than from a narrow 'I think the problem is this'.

Even in a small real world codebase with just a few tens of thousands of lines and a couple hundred possible user interactions, you need the intuition for "I think the problem is this" or you're gonna be there for the rest of the year stepping through the code. Once you've scoped the problem to a potential few tens of functions, then you can spelunk through it to find the exact issue.

Sure - I didn't really mean to imply that starting at main() every time was the way to go about things. I guess my point is just that I find it's usually a lot more useful to methodically explore than it is to sit and hypothesize. I dunno, I suppose (like a sibling suggested) it's pretty much the same thing, but I feel like I usually prefer to bust out the debugger as soon as I have any inkling of where the problem could be, rather than spending lots of time beforehand thinking about it.

> You should know (based on your domain knowledge of the program) the rough sequence of operations between the input and output of your code

This part is called experience, I think.

Might the choice of pivots in your binary search be called a hypothesis? I suspect you and TFA are talking about similar approaches, but giving them different names.

Exactly. Have probably done that bisection thing a lot of times - but often you have to have that prior "domain knowledge" or more aptly: familiarization with legacy code base, to a point where it's theories and hypotheses all the way(). How'd you do a bisection or even a gradient decent on hypotheses? So for me the main point is just to realize quickly enough when the beginning theory is futile, so I should switch gear to another one or different methods. Still this wasting of hours often happens,.. and I'm doubtful if you can ever remove it from programming, let alone "scientifically" as the article suggests.

() "the rough sequence of operations between the input and output of your code" - this phrasing also suggests to be far away from debugging hell a.k.a concurrency. At the latest then also your "bisection" becomes "theory" and guesswork, I presume.

Concurrency bugs are a whole other can of worms, and can be maddeningly difficult to track down. You can drastically shrink the search space with a few simple techniques (immutable data, deterministic locking orders, message-passing, not sharing state between threads) though. When you find a concurrency bug, it's almost always because you have shared mutable state accessed by multiple threads. Don't do that - or access the state only through a well-defined abstraction that's proven correct like a producer-consumer queue, MVCC, or Paxos - and the number of places concurrency bugs can slip in is dramatically reduced.

Grasping the "no shared state" guideline was a big step forward for me in learning how to write safer concurrent code. My earliest attempts at writing multi-threaded code in the late 90s used thread pool approaches. One of Java's big selling points when it was new in late 90s was threading support. Somehow pools of worker threads executing the same code, often with each thread blocking synchronously when handling an HTTP request, seemed to become the dominant paradigm. Different threads executing the same logic inevitably leads to the dread shared mutable state, and deadlocks and races. Enter Sam Rushing's Medusa circa 2001/2, which made a virtue out of the necessity of Python's GIL. Coding single threaded async non blocking code in a callback style is paradigm shift in terms of decomposing a problem. Once that shift is made the big gain is avoiding shared mutable state. Now, when I code multi-threaded C++ server code I ensure that there is no logic or code shared between the threads. Threads communicate using locking producer consumer queues with well defined copy semantics. The result is no deadlocks and far fewer races.

Not sure it is a hypothesis unless the hypothesis is that the bug is either in subsection A or subsection B (and yes, there are the odd cases that both or neither is answer). Mostly the dissection is about reducing the space in which eventually either an exhaustive search (like stepping through) is conducted or a hypothesis is tested. Experience with similar systems and their architecture is extremely valuable for partitioning the system.

Whenever I form too early a hypothesis and test it I run danger of going down bunny trails. I may see something wrong but I can't be sure the wrong is not caused by other systems.

I always test the most basic assumptions and functionality in an isolated test. This usually highlights the problem much faster.

Thanks for this comment. I did worry that this wouldn't come across very well, but my attempts to make it explicit all ended up poorly worded or read (to me) patronizingly. Still, to me it seems that you actually agree with what I've written.

Forming a hypothesis it seems is the least understood / most confused part of approaching something scientifically. It does not mean that you need to test each thing individually every time you touch something. "Changing individual variables" as applied to this doesn't mean adding one log line in one place and moving it one step every time. Get the information you need! Be practical! I recognize state space explosion (the combinatoric complexity you refer to) in the article as a primary reason that debugging is difficult -- this complexity makes it difficult to form absolutely correct mental models about software.

I'd also mention that if you're not systematically gathering data, you're not being very scientific. I didn't dig too deep into that because I wanted to avoid being patronizing. If folks aren't familiar with the scientific method, there are probably better references than me.

I think what you're calling "divide and conquer," I am including (implicitly) as a strategy in the information gathering portion of the scientific method. I wish that I had more room to put more of these sorts of strategies in, but my article is already past the recommended length of Queue articles.

The divide and conquer strategy is mostly generally effective, but it's not a universal tool. Debugging a race condition in some concurrent systems using this strategy is one example of where it fails if you're using it to get to a hypothesis. It might get you to "there is a race condition in my code," but this is not a hypothesis. It will not generally get you to, "there is a race condition between points A and B of my program when condition C is true and object O is not yet initialized." You can hopefully see how I can go from here to bring combinatoric complexity back into this strategy, turning O(log N) into superlinear O(log N!) (since factorial dominates exponents). And there's really nothing to be done about it: as I mention in the article, software describes the change of state in a system over time, and combinatorics is a fundamental part of state space complexity. For many modern systems, and increasingly, combinatoric complexity is a fundamental part of the equation.

Finally, I felt that including strategies that were not globally generalizable and applicable to _all_ bugs would be in error. The paper is largely about cognitive / social psychology and pedagogy, less about process. I definitely agree that divide and conquer is a great strategy to use for gathering information in the debugging process. Use the approaches that have worked well for you in the past! Share those with people! Thank you for sharing this approach with people!


Just pointing out that you are not contradicting the scientific method or in any conflict.

In fact, the way you arrived at divide and conquer is though the scientific method. Tried and tested, it was better.

The point is, the scientific method is just a tool. Where it is used is significant, as is preparing for it's use.

Arriving at "divide and conquer" --> You want to apply the scientific method at the highest abstraction level possible first. You want to be as meta as possible, then work your way down.

Dividing --> You're essentially preparing the code for easier consumption and by minimizing the problem surface first. Then when a problem is found, we are still guessing and testing. Except, most of the time the problem is so obvious it doesn't feel like guessing. Or it isn't. The mistake was already part of a known theory. Maybe it was of type "typo".

Yes. But how could this be applied to a more complex and running system, say a human being for whom a doctor has to find a diagnosis?

Doctors actually do use a very similar mechanism - differential diagnosis, as popularized on House (well, probably not exactly as popularized, but someone who's actually a doctor can go over the differences). The idea is to list all the possible diseases that could cause a patient's symptoms, and then run the appropriate tests or treatments to cut the search space by as much as possible. Cutting in half is often not possible in medicine, and they also account for how dangerous & urgent the condition may be when prioritizing.

I dont know about humans (or other living beings), but its the same process I follow when fixing a car.

I have always used a step-debugger for debugging and personally find the process somewhat thrilling, especially when debugging other peoples code and learning all the tricks and techniques that they are using to get the jobs done.

I don't know how much time and effort I have saved over the years using tricks I gleaned stepping through code written by people much smarter than I.

There is simply nothing that comes close to learning and debugging code line-by-line, examining variables and taking branches in real time. In fact, how people can debug large existing codebases without a debugger is beyond me.

Of course, nowadays in modern webdev it seems that most developers no longer use step-debuggers and, in general, want to rewrite the app in whatever stack will look best on their resume and/or sound impressive in the pub or gym afterhours.

I long ago gave up poo-pooing the importance of social signalling within the young-ish developer community, something that really did not exist when I was 20-something as programmers had few choices about the stack.

I really love the jist of the OP, and agree that the misapplication of the programs mental model is the underlying cause of almost all bugs. This mental model is what I'm always trying to load fully into my brain as, IMO, true productivity occurs when I have it fully loaded and I am confident that I am covering all the edge conditions.

Debugging must not be an afterthought in educating; industry must stop insisting that bugs be interpreted as failures of individual programmers ...

Funny...I've really never separated debugging from developing as they always go hand in hand, and certainly have never thought that bugs signify "failure". In fact, to me it represents progress...

"In fact, how people can debug large existing codebases without a debugger is beyond me."

I currently work on a product with a 15-20 Mloc codebase (mostly C). The development facets which give the closest parallel to a line-by-line debugger are:

Components in the system communicate with each other by sending signals through the framework. All such signals are traceable in all builds, and can easily be decoded. These narrow down which components is doing something wrong, and what exactly it's black-box behaviour is (i.e. when I see bug B, it's because component C is giving output O after receiving input I).

Lots of trace. All branches are traced in debug builds, unless there's a serious perf reason not to. Many interesting variables are traced too.

Extensive UT and FV, using powerful (rubbish in other ways) frameworks. Any real-world bug should be reproducible in automated tests, which you can build and run with debug-level trace.

It's not always fast, but with these you can always find the problem line / section with typical code bugs. Obviously some types of bug (e.g. perf) often don't come down to 'a line of code' or 'a bad interaction', so aren't necessarily resolved this way, but that's the same with line-by-line debuggers, in my experience.

Tracing is great but often gives you a low signal-to-noise ratio and for new code, is often missing for the bug you're trying to investigate.

A debugger lets you add tracing as the system is executing, which in my experience is extremely productive, especially in C++ codebases where adding some logging can result in 10 minutes to rebuild and deploy. And I don't know what I would do without data breakpoints that give you the callstack and threading info when an address is written to.

In case it wasn't clear: I don't mean to suggest that no-debugger is superior to debugger.

WRT the benefits of interactive debuggers, I definitely see the advantages of faster debug cycles. Luckily, it's not often that I need to worry about the 'often missing for the bug' / 'callstack and threading info when an address is written to' style of bug by virtue of working in codebases with strong coding standards and diligent code review. Most of the bugs I need to fix are 'developer didn't think of scenario S when writing feature F' or 'developer has forgotten edge-case E', rather than 'code is broken e.g. address re-use'.

And as I said in a cousin comment, I can't deploy a system with trace. That 10 minutes is just spent building :-p

What do you use for tracing?

Nothing special - just macros (provided by the framework that the components run in), which allow the programmer to filter based on filename and/or log level, and print the filename, function and text to file.

There's so much trace that you can't use it in production mode, even if you have a test system which isn't handling any load, so it's purely a debug tool.

> There is simply nothing that comes close to learning and debugging code line-by-line, examining variables and taking branches in real time. In fact, how people can debug large existing codebases without a debugger is beyond me.

Debuggers as a general purpose debugging tool never scaled up for me. Working out a subtle data flow or logic bug in a small piece of code, or having a post-mortem debugger in a failed test, sure, but working with a larger codebase was always very cumbersome for me.

For most bugs I fix I never fire up a a debugger. Often the bug is obvious from the trace, or even just the line number of a failed assertion. If the bug is not obvious at this point I tend to do some manual analysis and looking at the code and flow. By now >98 % are fixed. Often it's more difficult and time-consuming to write the regression test case than the actual debugging.

Sometimes there is that rare elusive bug. But I find that thinking about it and the code works better than using a line debugger, especially for larger systems.

I also find that many "classic" tools are highly inadequate to debug distributed, threaded or asynchronous (or a combination thereof!) software -- similar to how a lot of tooling just runs into a brick wall with threaded software. (Luckily Intel VTune exists).

>Of course, nowadays in modern webdev it seems that most developers no longer use step-debuggers

You make it sound like step-debuggers were the traditional, established thing.

Whereas most programmers back in the day would get by with some carefully placed print statements (based on specific intuition) instead of stepping around to see what goes on.

"The most effective debugging tool is still careful thought, coupled with judiciously placed print statements." (Brian Kernighan).

I've used both, and still do use both in concert with each other.

The debugger is a tool for gaining that intuition about how the program operates, and carefully-placed log statements are a tool for cutting the search space of where you need to place breakpoints and single-step.

"carefully-placed log statements are a tool for cutting the search space!

Indeed, inevitably serious problems tend to be where there is no or poor logging :-)

I'm a heavy user of the excellent Visual Studio debugger for C++ and C#. But for years I didn't use an IDE or debugger for server side Python; I did a lot of Zope & Django debugging with print statements and log files. A couple of years ago I started using IDEA's PyCharm. It's massively accelerated how quickly I can fix bugs in Tornado based server code. More recently I've switched my server side coding style to use coroutines, and that requires a shift in debugging mindset as one can't make the same control flow assumptions as for conventional callback style single threaded async code. I'm looking forward to debuggers that understand how to step over yield or await.

Oh yes...don't get me wrong, there will always be room for crash-and-burn debugging (my pet name for that style), and no doubt interpreted languages make it a much faster and safer method than doing the same thing in C back in the day.

Of course memory corruption was always a nasty potential culprit in C, and once again, using a step-debugger help with those issues immensely for a variety of obvious reasons as you could very quickly often see the exact line the program blew up.

BK's explicit endorsement of C&B debugging was bound to show up sooner or later, and we didn't even really touch on the difficulty of using a stepper in multi-threaded network apps, but remember I simply posited that stepping through code in real-time helped me quickly learn from devs much smarter and more experienced than I.

For me that only applied to ZX Spectrum coding, ever since I moved into MS-DOS and also when coding with friends on their Amiga systems, that I have always used a debugger.

Only when the environment forbids me of using one, do I make use of such techniques.

> In fact, how people can debug large existing codebases without a debugger is beyond me.

Tracing and telemetry. Debuggers are very useful, but only when they're needed. In fact, in large distributed systems 'debugging' end-to-end bugs is usually hindered with a debugger.

As an aside, even when I have to use a debugger during runtime, the largest part of my interaction with it is heavily scripted, and the debugging process is not interactive.

Large existing codebases are always a handful. But it's the small embedded environments where you might not have a debugger that I've experienced. Or the problem is inherently realtime so cannot be slowed down sensibly. For example when debugging a USB endpoint, it would be nice to stop and examine a packet, but not responding in time causes the host to un-enumerate the device.

Twice I've had to use the technique of "write values continuously to uninitialized SRAM, then on reboot print out what you find there" to investigate this kind of thing.

The worst I ever had to do was on an embedded machine with no CPU cache. We were getting a crash, but we couldn't figure out how the code was getting into that function. So we hooked up a logic analyzer to the address bus (this was years ago, when CPUs were DIP packages). We wrote to an unused address decode when we entered the function, used that decode to trigger the logic analyzer, and were able to read back what addresses the CPU had been executing before it entered the function. With that information, everything became clear.

Debuggers like Visual Studio, windbg, gdb are excellent tools but they can't always take you to the problem, especially for multi-threaded server code. Yes, they're great for bringing your mental model of the code execution into line with the real behaviour of the code. But for nailing tricky sunchronisation issues in multi-threaded server code you'll need plenty of configurable logging, and memory profilers like valgrind or purify.

Interesting point. IDEA supposedly added better support for "async stacktraces" recently-- https://blog.jetbrains.com/idea/2017/02/intellij-idea-2017-1... but I'm yet to give it a shot.

Would be good to see this in PyCharm to enable stepping over yield in @tornado.gen.coroutine methods. The whole step over step in debugging model assumes traditional imperative code and stack frame continuity.

I do a lot of back-end web service work and I really enjoy using unit test debugging, it's really fast and gives you a good clean starting point. The times that I have to hook up the debugger to the web server to debug against the website in the browser can be really frustrating and a lot more work especially since the client-side code will call out again to retry web api calls that took too long. I'm yet to find an effective way to debug in this way without deleting breakpoints that get hit and then assigning new ones (and who knows which call exactly I'm debugging but since the data is the same it usually doesn't matter). Does anyone have a better way to debug than the way I'm describing, I use Visual Studio and haven't found a way to easily limit it to just the first call attempt?

I agree - debugging is an integral part of programming. I don't mind debugging, and often bugs are good for you:

- solving a bug teaches you something. You either learn something new about how to write good code, about the product or about the domain.

- debugging problems teaches you how to write code that is easy/easier to debug. You see what information you wish you have had, how to be able to track what is happening in the program etc.

- solving problems and mysteries (how can this even happen) is often fun in itself (like solving sudoku or crosswords or reading murder mysteries).

- when you resolve a bug, both you and your customer are happy. It's a bit of a paradox. If there wasn’t a bug in the first place, there wouldn’t be a need to fix it, so why should they be happy? However, my experience is that they are happy to receive a bug-fix, especially if it is solved quickly

More detailed argument here: https://henrikwarne.com/2012/10/21/4-reasons-why-bugs-are-go...

"debugging is an integral part of programming"

Can't remember where I read this: programming is 20% writing bugs, and 80% fixing them.

I recently fixed a bug in fontforge to do with extracting fonts when parsing pdfs. It took me a whole day because I didn't know what I was doing (haven't worked with C since Uni over 15 years ago).

Because I didn't think to look into using a debugger, I did it very much the hard way by adding trace statements all over the place to slowly narrow down the code path. I knew I was looking for something like that looked like an iteration typo, but it was a large unfamiliar codebase so it tok a long time.

Got there in the end! Probably 2 days of work start to finish for a 1 character fix. Classic.

I will definitely be starting by re-learning to use a step debugger in C next time I'm in the same position.


> In fact, how people can debug large existing codebases without a debugger is beyond me.

Some people work with systems that operate in real time (networked systems are usually in this category) or otherwise can't be instrumented (embedded systems?) and cannot be replayed at will. Not everything is a desktop application or simple CRUD.

Web developers​ use debuggers too.

Of course, Dijkstra had his view on it. I try to adhere to his position and prefer proof by type-checking over proof by single example.

"A common approach to get a program correct is called "debugging" and when the most patent bugs have been found and removed one tries to raise the confidence level further by subjecting the program to numerous test cases. From the failures around us we can derive ample evidence that this approach is inadequate. To remedy the situation it has been suggested that what we really need are "automatic test case generators" by which the pieces of program to be validated can be exercised still more extensively. But will this really help? I don't think so." [1]

Linus (used) to have a similar position [2]

Debugging hides the underlying architectural problem: to either communicate or abstract the algorithm in such a way that it can be understood by the programmer. If something needs debugging, not the bug needs to be fixed, but the architecture to understand the bug before debugging.

[1] Edsger Wybe Dijkstra - EWD303. https://www.cs.utexas.edu/users/EWD/transcriptions/EWD03xx/E...

[2] http://lwn.net/2000/0914/a/lt-debugger.php3

Thanks for this.

This seems to be basically what I'm saying through most of the article. Bugs occur because of a gap in a mental model. To fix the gap, we need more understanding. I'm not sure I would say that's necessarily a design or architecture flaw in the software.

Unfortunately, proving things correct in most popular languages is a Very Hard Problem. I do address that to some extent in the article, though. In particular, the sorts of languages that end up being easy to prove correct tend to be difficult for people to learn. Therefore, people are likely to either learn something easier, or they are likely to write code that is "correct" or "safe" but still has logic errors. I think that there are other many valid reasons to write software, and not all of them require correctness to the extent Dijkstra would like.

But I think that's a pipe dream anyway. For one, most software is not written in formally verifiable languages, or formally verifiable dialects of popular languages. Porting all software would be impossible. Second, who defines "correct"? What about bug-compatible software, where the definition of correct is imitating something else that is known to be incorrect? What about ambiguity in specifications like JSON, or undefined behavior in C (some of which can only be detected at runtime, because we don't actually have infinite tape in our Turing machines)?

What happens when there's a bug in the runtime of your formally verifiable language? What happens when there's a hardware bug? I'm not saying these are necessarily common, but regardless of your software environment, you're running on something else that can have the same bugs. People make mistakes, even when proving things. Math is great, but people make mistakes, and our tools are still bounded by the laws of physics.

No matter which way I look at this thing, I keep coming back to teaching and understanding as the only generalizable and approachable ways to address the problems bugs. (Whether that approach is taken by proving correctness of some implementation, or post-hoc with testing and tools.)

Thanks for a great and insightful article and answer. Views like yours should be shared by a wider audience, since it accentuates a section of software engineering that is rarely discussed: understanding and adjusting the mental model of programming (or should I say: systems design?)

Applying formal methods does not automatically entail that a system is proven to be correct, which would be an exaggeration. Proponents of formal methods believe it helps to ascertain certain robustness and reliability aspects of a system. For example, I could prove a Paxos implementation correct, given the specification of a Paxos algorithm, but still use it incorrectly in a larger system, or vice versa.

Correctness is a rather vague and abstract concept and quite distinct from proof. Using a type system, we could prove that, given certain pre-conditions, large state-spaces cannot be visited. Whether it will behave correctly within the reduced state-space, the programmer can only hope. Even formal languages such as Coq rely on correct execution of the assembler, CPU and system environment.

I think what Dijkstra was aiming for was not a replacement, but more a grounds-up reeducation of the field. And to a certain extent this is currently happening. Functional languages and techniques are everywhere. I hear people talk about monads, monoids and functors, others embrace reactive programming, which is the coinductive branch of programming. Coq and Idris are finally reaching a larger audience (but still small). However, the larger change is slow and requires a generational paradigm shift.

It is exactly the thing you are addressing in your article: do the users of a tool believe they are changeable and changed by the tool (Heidegger)? If they do, they also understand the need for self-protection and focus.

Teaching and understanding are indeed the underlying pillars if we want to bring software engineering to a higher level. I think the order in which material is taught greatly influences the mental model of the student. Making multiple iterations, one could start with concepts from category theory and work down with for example Coq, Haskell, LLVM / Assembler, Machine Language/system design, CPU Micro-ops to Digital Electronics. This would lead to view the concept of an LLVM as an implementation artefact of Haskell (which in itself can be expanded to other languages). The mental map becomes relatively machine and system independent. Unfortunately, most programmers nowadays start with an easy language that gives instant reward. It is made difficult for the aspiring programmer in this position to understand the concessions made.

Thanks for clarifying, I agree with all of this, and they're great points!

By June 1949, people had begun to realize that it was not so easy to get a program right as had at one time appeared. It was on one of my journeys between the EDSAC room and the punching equipment that the realization came over me with full force that a good part of the remainder of my life was going to be spent in finding errors in my own programs.


"Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?"


If you have to use 'clever' hacks, the least you can do is justify it in the comment

And verify it with good tests. It sometimes happens that a certain piece of code is inherently complex, due to a complex underlying algorithm. But often, test cases are still easy to define, so it makes sense to heavily make use of them.

For example, sorting algorithms may be somewhat complicated in implementation, but it's trivial to come up with some sample before/after lists.

Fortunate that a good programmer will get more clever over time.

Also will know when to write boring and or testable code. Testable code is also debuggable more easily.

Debugging an API, which may have obscure corner cases or usability quirks seems appropriate to me. "Debugging" an algorithm which has inherent logical flaws by attempting to patch it up with case-base reasoning and exceptions (logical, not runtime) is a road to disasters. I have seen too many devs trying to hill-climb their way out of a logical hole and the result is invariably a mess of a program which is still wrong.

Also, in case of concurrent and parallel code debugging is basically useless since realistic timing conditions are not reproduced.

I personally prefer logging, which gives a faster and more targeted trace of program behaviour than step debugging.

My secret sauce to debugging is formulating hypotheses about what went wrong then instrument the software to validate the hypotheses. By instrumenting I am talking about manipulating running software to give me information.

Sometimes the manipulating step is made unnecessarily hard, and the trick is to force the hand. I see myself an inquisitor of software and use programmatic torture to get what I want to know. That's the central attitude to debugging.

The devil is in the details, there are many possibilities to get information: using a runtime debugger, insert output statements, read the logs, use LD_PRELOAD, use strace, and so on.

My favorite debugging trick (when faced with an 'impossible' bug) is to state my assumptions, it's usually clear when they're all laid out which ones are conflicting.

Sounds like Rubber Duck Debugging. A simple, yet effective technique. http://wiki.c2.com/?RubberDucking https://blog.codinghorror.com/rubber-duck-problem-solving/

Having just read the recent article on HN about Japanese workers "pointing at things" got me thinking that the two actions probably stimulate a similar area of the brain. Sometimes we just need to get "out of our head" to make a breakthrough in the problem we're trying to solve.

It is no longer Rubber Duck when you actually spell out the assertions. It becomes a form of very good and useful documentation and sometimes runtime verification.

Sounds like a variation on Ayn Rand's "contradictions do not exist, if you think you see one, check your premises, some of them will be wrong".

Is that quote intentionally saying that you have contradicting premises, and by extension is the statement in that sentence that contradictions do not exist contradicted by the later part of the sentence, or is there something about the sentence that I'm not understanding?

It reads like that at first. What it's saying is for example: Assume x is 5. Assume function TimesFive multiplies things by 5.

When we do TimesFive(x) and my answer is 50 instead of the 25. I check the TimesFive function and it's just "print x*5". But how is that possible? 5x5 is 25!

The problem in this case is my assumption that x would be 5 was wrong.

I've had this issue before in my code (though not as trivial an example of course), but it's always a good reminder of something to check when troublshooting.

Which sounds like a variation on various Socratic learning techniques.

Or if it's not clear, you can start testing/tracing/logging to see which assumption isn't valid.

>Are the arguments to memcpy in the order of source, destination, length; or are they destination, source, length? Is strstr haystack, needle; or is it needle, haystack? Many individuals develop a habit of consulting references (such as system manuals) as soon as the question comes up.

I've always found this particular problem solveable by allowing programming languages having syntax in the form of

    public void CopyFrom(int address)To(int address)OfSize(long size) { // impl }
instead of

    Copy(int fromAddress, int toAddress, long size)
Are there good reasons why languages haven't adopted this syntactic nicety?

Some JavaScript frameworks use method chaining in-order to give you something very similar to what you are talking about.

  $('#my-div').css('background', 'blue').height(100).fadeIn(200);
Personally I don't use JavaScript frameworks very often so I'm still not very used to this.

The first time I encountered it I didn't like it at all but now I think it's ok.

Something else that also enables code similar to what you are talking about is using named arguments in Python.

  somewhat_similar(barfoo=thing, using_fnord=mahjong, baz_provider=xyzzy, quux_helper=fob)
but personally the way I use named arguments is only for the select few arguments that will sometimes appear and sometimes not in combinations that make the ordering alone insufficient to handle this.

In my code regardless of language, I generally try to order the argument with target as the first argument, source(s) second and additional arguments following that.

Somewhat related to these things is also the fact that I use constants and enums for values rather than "true" and "false" when calling functions that take booleans as arguments.

As I noted in another comment I generally work on small programs at a time. This makes it generally ok for me to remember or to find the arguments taken by a function even though I don't have autocompletion.

I have a stray dash in the above comment. Too late to edit now. Where I said "in-order" I meant to say "in order".

Obj-C uses this. C, on the other end, allows prototypes to be defined as:

    void* memcpy(void*, void* size_t) // no argument name

Which is the worse IMHO. The autocomplete cannot event tell you which is which..

And Swift, and Smalltalk. 3 generations of that one family.

It leads to very verbose code, which is perhaps why it hasn't caught on in other languages. It also can be surprisingly clumsy when a method is called from multiple call sites (which may apply it to different parameters in subtly different contexts, which can make the original phrasing awkward), or when a method is passed around as a first-class value.

Personally I use prototype definition without argument names because when I change the name of an argument in the C file I don't have to go to the header file and change the prototype. The code I work with in C is generally small enough that I have all the relevant C files open in different windows while working on the code, and furthermore because my programs are quite small I don't use autocompletion. What I do have however is simple completion that is not sensitive to the context but which will complete a word once I've typed part of it. This way I can have meaningful names for things without having to spend too much time when I write them.

Surely having your IDE bring up the argument names during autocomplete solves this?

Dependence-on-IDE is a liability.

Ability to find and read the docs: an asset.

Yes, I don't see this point either. A 'man 3 stpncpy' or an alt-space seems rather similar to me. Actually, with the 'man 3 stpncpy' we cannot be sure the results will align with the target platform.

Everyone uses IDEs, whether it is Emacs, Vim (with plugins), IntelliJ, Eclipse, Atom, Xcode doesn't matter.

Yes, I also use whiteboards, A4's, a notepad, mindmaps (Freemind), post-its and more. The creative parts of SE, such as architecture and design, do not blossom behind a computer.

You don't think the IDE automatically bringing up the documentation for the function arguments is more efficient here than typing terminal commands for each function you're using? I don't understand why people insist on using bare bones text editors when IDEs with more automation exist.

Yes, perhaps I was not clear. I do think an IDE is more efficient. Actually, I believe a modern IDE is barely scratching the surface of what could be possible. I am one of the lunatics who believe in editing directly on the AST, forgoing the filesystem and working on one shared merkle-tree-like database with function signatures as the primary entry point for search.

> Dependence-on-IDE is a liability.

Why? Use the best tool for the job.

> Ability to find and read the docs: an asset.

Read the docs + use a good IDE.

What's the point of having two names for each parameter instead of just one? Just allow using named parameters in calls, this Objective-C-ism is ugly as hell.

How would you call your function and how would you refer to the arguments inside the function?

Udacity actually has a course on systematic debugging taught by the creator of DDD.

The course covers the importance of the scientific method and hypothesis testing, divide and conquer approaches, etc.

Nice article. Thought I'd note something:

> Software developers spend 35-50 percent of their time validating and debugging software.1

This is a strong claim, so I looked at the cited source, given the unfortunate experience of having seen many strong claims backed by weak underlying data.

> The research phase will predominantly comprise of short interviews with approximately 10 to 12 organisations that compile code on the LINUX operating environment. These organisations will also fill in the Cambridge Venture Project (CVP) survey to help quantify how much time they spend debugging with and without RDBs. Broader research in the form of the CVP survey with potential users of reversible debuggers will be conducted to gain insights on how much time is currently spent debugging.

Further down:

>49.9% Programming time spent debugging

> According to 54 questionnaire responses to the CVP survey and 11 interviews, based on the question ‘Of the time spent programming, what percentage of your time is spent on the following: (1)fixing bugs (2)making code work (3)designing code (4) writing code.’ (1) and (2) then were grouped as debugging.

While this might be indicative of a widespread phenomena, it doesn't seem to be enough evidence to support the factual and strong claim of the first sentence in the ACM article. There certainly isn't enough information in the cited source alone to decide (and given that, is this really sufficient as a source?).

Perhaps I missed further evidence?

That being said, I won't throw the baby out with the bathwater. Debugging is clearly a time sink in industry, regardless of what the actual percentage is.

I enjoyed this piece overall. I was previously unfamiliar with some of these terms -- specifically incremental vs. entity mindsets -- and it shed some light on work issues I've seen (coincidentally in the debugging technology space). Being someone with an incremental mindset and having worked for someone with an unjustified entity mindset, I found the environment suffered from almost all the problems (and more) you described associated with the latter mindset. Granted, this is anecdotal.

I quite strongly agree that improving the process and mindset behind problem-solving results in a significant improvement to specific skills like debugging software. It's refreshing to see an article like this instead of the many others hyperfocused on specific, less-abstract techniques (and widely-applicable abstract techniques/processes are the most effective use of one's learning time).

Thank you for the feedback. I chose (for better or worse) to cite the most recent article as opposed to maybe the least controversial one. Other papers I've read have suggested similar ranges; nobody presents a stddev so it's hard to compare. And how you classify "debugging-related tasks" is kind of subjective, as you correctly point out. So linking other papers might or might not be useful anyway.

In "A framework and methodology for studying the causes of software errors in programming systems" (Ko & Meyers 2005), they cite a NIST publication as saying 70-80% of time is spent debugging. I parroted this off in a talk once, and then I looked at the same publication ("The Economic Impact of Inadequate Infrastructure for Software Testing" RTI 2002) and the only 80% I found was time spent in testing software in the early days of software engineering. In the 1990s, it finds coding and unit testing to comprise about 30% of the time. This is as far as I can tell, at least similar to the (1) and (2) in the CVP survey. But again, indirect comparisons, who knows?

In tables 6-4 and 7-6, the RTI study finds different numbers for time spent on bugs, and different frequencies of where bugs end up being discovered. Whatever the rate, it is clear that post-production discovery bugs are the hardest to fix, and I think this is because you only ship code when you have high confidence in its correctness.

I think this is a hard measurement to make. It's hard to find participants, hard to guarantee they have the same idea of what "debugging means", hard to quantify. So thank you for calling me out on this.

But I think other studies have evidence that this isn't too far off from a correct measurement. For example, Gugerty and Olson in 1986 did some novice vs expert study at debugging LOGO programs. (I don't like the Dreyfus model for classifying debugging, but whatever.) They found that novices took about 18 minutes to fix bugs, testing an average of 3.6 hypotheses per program, with the first hypothesis being correct only 21% of the time, and re-executing code every ~3 minutes. Experts took 7 minutes, testing an average of 1.6 hypotheses per program, with the first hypothesis being correct 56% of the time, re-executing code every 2 minutes. So guesses are only right maybe half the time if you're really good.

Basically any novice vs expert study compares how much time people spend on the problem (they're trying to reduce debugging time), but it's hard to extrapolate this into percentage of development time. For example, some people spend 100% of their time debugging because they're on QA or sustaining engineering teams.

To be quite honest, there _fundamentally is not_ enough research into the practice and pedagogy of debugging. I have about 60 papers in my "Debugging Papers" folder that I've read over the past year, and basically everything up until the Dweck-inspired research in 2008 isn't great. There are only 9 papers in all of computer science that cite Dweck that I've found as of maybe 6 months ago (I'm not a researcher and I don't look that often). From memory, two of them were recommending further research based on her work, maybe five of them were from people in CS departments helping Dweck create games and programs to perform her research, and then only two were actually applying her work to debugging.

I would agree with you if you said this needs more research, and that the claim isn't strongly supported by the provided evidence.

I greatly appreciate your feedback.

I think it's fascinating to see how my friends at university (who previously had no programming experience) have started to "get" debugging and the benefits of using something like a step-debugger. It took several programming courses before they started to actually debug the code instead of just reading it and trying to figure out what happens. I mean, reading the code is also an integral part of debugging, but often times they said things like "fooBar _should_ be true, so we _should_ enter this conditional". Then it of course turned out that fooBar was in fact not true because of some logic bug or something they didn't even consider.

I had the same experience at University. We didn't learn how to use a debugger and rather relied on logging data to see the flow of the program. That, plus the fact that our code was a big mess since we were just starting didn't make it easy to find the more obscure bugs. I was really impressed the first time my friend fired up Netbeans and showed me that how he could find the value of the variables at runtime, at any point in time of the execution.

Today I find it unacceptable when I'm in a situation where I can't debug a program I'm working on, it just seems like a waste of time.

The problem I had at the university was that our code was so simple relative to large codebases that a debugger was generally not necessary for me. We learned how to use it and I found it to be a burden because I could find the bug "much quicker" either with 1-2 print statements or just by reading it. After working with much bigger problems I realized the importance of debuggers and really wished they were stressed even more than they were in my university days.

I personally found this to be a poor article on the subject. There's very little actual content about debugging, he spent more time talking about theories of intelligence than debugging.

I've recently been thinking about this, thinking about making a course. It's my personal experience that his approach is flawed. Fundamentally, step one is not develop a theory. That is the worst thing you can do, it can waste a lot of time. I still sometimes guess what the problem is, but all I really use that for today is where to look and that's sometimes a waste of my time as I skipped step 1.

Step 1 is recreate it. No theorizing, no thinking, no faffing around. Can you recreate the bug exactly? If you can't, find out why. Talk to the bug reporter, find out what they were doing, this is a people skill you need to learn as a programmer. Maybe you need access to the live data to see the state an object is in. There's only a tiny percentage of bugs (usually race conditions) that can't easily be recreated. If you can't recreate it, that's when you turn to logging to try and catch the conditions to recreate it, but logging is the last thing you should try and I rarely ever have to add it (we're talking 1% of bugs). I often see junior programmers littering code with log statements when trying to debug, I feel this is a bad method, but can understand the temptation, especially when they don't really understand what the code is doing (an understandable position I was in in my earlier days).

Step 2 is isolate it. If the code is simple and it's obviously one method, no further work needed. If the code is complex, you need to really isolate the exact problem. Say you've got the wrong value displaying for a basket total, where is that coming from? Is it not updating, or is the total function the problem? Isolate the exact step in the process that's going wrong. Sometimes it's hard to isolate and logging is needed again, but same advice as before, logging is a rare necessity.

Step 3 is make it easy to run a recreation. If you can run it in 10 secs or less, that's great. If it's one method of a complex web call, make a test endpoint that only calls that method that's failing and exactly recreates the conditions. Then you can run it quickly without all the cruft. In an extreme case, this can be taking the code completely out of the context and putting it in a tiny standalone app.

Step 4 is read + understand the code. Is there an obvious logic bug? Don't jump to conclusions. Still don't get what's going on? Step through the code, breakpoints. Understand what's going wrong. Breakpoints and stepping is fine, inspecting variables, etc.

Step 5 is fix it. This should be fairly simple now. This is also the time to think about the future. Is this part of the code a constant headache? Is it buggy because it's over-complicated? It might need a bit of a refactor to make sure this doesn't happen again. This is a judgement call, unnecessary refactors are expensive, might even introduce new bugs, but over-complicated code will lead to more bugs. It not being written in your favourite style is generally not a legitimate reason to refactor.

Step 6 is testing the fix, recreating the conditions from step 1. Don't rely on QA if you have them. Ensure you've done your job.

(Step 7 get rid of your logging statements + test it again! You just changed the code)

Now you can skip steps, be far less formal about it. Sometimes it's obvious. Sometimes it's hard and you may need to iterate over 4/5/6 a few times. But never, ever skip steps 1 + 6. Recreate it. Test your fix. Because if you skip those steps, and you "fix" it, you don't actually know it's fixed.

I 99% agree with you, I would 100% agree with you if you put step 6 between steps 3 and 4. In my opinion you should always (where possible) be able to deterministically and automatically recreate a bug before you even think about fixing it, the automatic part being your test. I say this because human nature being what it is you will forget how you recreated the bug, or your recreation will have a bunch of extraneous stuff in that really isn't needed.

Unless you can prove the kind of nondeterminism which triggers the bug. Sometimes it is much harder to write a testcase to reproduce a nondeterministic bug than proving formally that it isn't possible after the fix.

I fully agree with all of this for code. However I have often thought the same techniques and mental model approach for debugging any complex system could be boiled down. I also attempted to write a course along the same lines and kept throwing it away.

In addition to your steps though, I often include a hypothesis combined with various likelihood probabilities (sure there COULD be a JVM bug, but is it likely?), and I attack highest probabilities first.

Thanks for saving me the trouble of writing this post. ;-)

I'll just add this: once you get to Step 3, conditional breakpoints are your best friend. Sometimes they're too time-consuming and you have to resort to manually "instrumenting" the code to get the conditional breakpoint, but the principle is the same.

And all bets are off when considering realtime behaviour such as concurrency or fairness. Neither debugger nor logs will help (they change behaviour and reproducibility is low) so you have to get to the basics and make a formal review. As in describe the expected algorithm, its preconditions, postconditions and then verify if this contract is met.

Yep, know it well, there's a whole class of applications that don't play well with a debugger. But, most developers aren't working on those applications...

Most developers do not worl on user facing concurrent code? Such as user interfaces?

No, not normally. It's fairly rare, and when it does crop up, it is typically handled with pre-baked solutions that are used to spawn a worker thread/process for the purposes of off-loading work from the main UI thread, with minimal communication with the main UI thread via posted messages.

Thanks for your feedback. This article was largely intended to explore why some folks just don't seem good at debugging, and don't seem to get better at it. This is where self-theories comes in. It does a rather good job of explaining that and has a large body of evidence supporting its validity. And importantly, that we can actually do something about that. I mean, in bold, at the top, it says "Understanding the psychology of learning strategies leads to effective problem-solving skills." So I think you should have been prepared for an article that wasn't going to talk about debugging strategy in depth.

I think you might have misunderstood what I meant with "develop a general theory." I say this because you've changed it to "develop a theory," which is explicitly not what I actually wrote. It literally means something completely different in the context of science. A general theory is just a concept. It's the formation of ideas. It's the process of information gathering. Where you go from "this is a bug" to "this is a bug in the logging module" to "this is a bug in the logging module when I'm producing more log messages than I can write" to "this is a concurrency bug" to "aha the actual bug is somewhere else, let me look there", etc.

Your first tangible goal should be reproducing the bug, because that is how a hypothesis is tested. But you don't magically start with a hypothesis unless you have very good intuition, which means this isn't a new bug for you, and so you didn't really need to employ any complicated strategies. My article is not about how to handle obvious, run-of-the-mill bugs.

And so between the implicit step 0 (there is a bug) and step 1 (I can reproduce it), you end up assuming quite a lot.

I mention in the article that reproducing a bug is really crucial, because that is how we know we've found where the gap in our mental model is. You've gone on to describe some ways to do that. I explicitly did not take this approach, because such a list would be exhaustingly long, and my article is already long for an ACM Queue article.

If you've reproduced the bug, you must have already figured out what was wrong. How did you do that? Were you staying up all night staring at the problem instead of taking deliberate breaks? Were you collaborating with colleagues or stubbornly tackling the problem alone? Were you scientific in your approach? These are all things that stem from the root of the problem, and are maladaptive strategies that folks engage in. Strategies that prevent them from getting from step 0 to step 1.

So it seems odd to me that you would call my "approach" flawed. To me, it seems you're not actually starting from the point of "I do not understand the problem." If you go to github, pick a random issue in a random project in a random language, and go to fix it, what is the likelihood that the first actual step you take will be reproducing the bug? This is where people actually start from, and this is where we are every time we encounter a new bug.

I'm not sure I agree with the order of your steps. If you are new to software (new job, entirely new codebase), and one of your first tasks is a bugfix, how are you possibly going to have step 4 ordered before step 1? You need to have some kind of mental model. You're going to need to talk with colleagues, interact with people. Gather some understanding of the system architecture. This is a learning process and an information sharing process. _This is the point of my article._ Thank you for taking the time to share your experience with us!

I'm sorry you didn't like my article. It sounds like we agree more than you seem to think we do, even if I'm a bit critical of your steps. I think we have different audiences in mind, perhaps.

The paper is almost entirely focused on incremental learning. It then throws a very brief outline about debugging at the end. What is outlined is pretty much step 5 in my list. I feel that it's twisting the actual content of the article to assert that it naturally follows that "your first tangible goal is to be reproducing because that's how a hypothesis is tested". If that was the belief, why not mention it? It really is lacking a tangible framework for entering a debugging mindset.

The words recreate or replicate don't appear in the article. Reproduce is used once at the bottom of the article in relation to the actual advice ("so it seems like the bug has been reproduced at this point"), and once tangentially in the discussion about tools. How can there be a discussion about a debugging mindset without those words?

As for step 4 coming before step 1? I feel talking about one month in an entire career is moving the goal posts. The abstract talks about how much time all programmers spend debugging, with no talk about the first month of a programmers career. And it talks about teaching debugging, a mindset to put in place before that first job.

That is what my steps were attempting to describe, a method for teaching how to enter a debugging mindset.

Yes, the paper is almost entirely focused on learning because it is a paper about applying somewhat recent research in social psychology into the area of debugging. It is not a paper about debugging. I can see how you might be disappointed if you expected a paper on debugging, but that is not what this is, and it says so at the top, in bold letters, above the abstract.

Anyway, I disagree with the rest of the points you make here, but I don't really feel like arguing about things that my paper is not (and was not) intended to be. I'm sorry you couldn't get much out of it, and thank you again for your feedback!

To make debugging easier you have to pick tools (languages, IDEs, methodologies) that make it easier. In other words you optimize for debugging.

For example, I find Erlang code very easy to debug. There are few reasons for that:

* Built-in runtime tracing is easy. Can remsh into a VM node and trace any function at runtime.

* Functional style with immutable data and variables make it easy to figure what is changing in the code. Because of immutability the state is always updated explicitly. That makes it much easier to understand what is happening for me that figuring it out than having a class inheritance hierarchy with methods overrides and such in an OO language

* Code hot-patching makes it easier to test and iterate to zoom on in a bug. With care it can be done in production. You see strange failure you never saw before and which you had an extra log statement in there? That's easily done.

* Language is simple and self-consistent. That means there is less chance someone used some obscure new feature I haven't heard about. This is a bit like C vs C++. C is very simple at the language level. You can still have hairy code with triple pointers, funky dispatch tables with Duff's Device thrown in there but at least you can read through it. Looking at C++ code I sometimes couldn't even parse what was happening because of a language feature (so they overrode the operator some place, and using a templates, ...). Or someone created a macro and then everything is more interesting all of a sudden.

* Process heap isolation. When something happens with one process, I know for a fact that it hasn't scribbled on memory of other processes. I can restart that process if needed and test my hypothesis of what I think is wrong, often without affecting the rest of the system (even while it is running in production).

* Sane concurrency primitives: Having the ability to create hundreds of thousands of processes with isolated memory often mean a smaller impedance mismatch with your business code so you have less total code to write. The less code your write, the less code you have to debug.

Anyway the point being, debugging should be a trade-off in picking your tools. People don't think about it much and focus on other things first. But it will be something that over time will make a tremendous difference.

> Virtual machine-based languages, interpreted languages, and languages with runtime environments encourage users to view the execution environment as a black box. The goal here is to make programming easier by reducing the scope of the mental model the programmer must maintain. When bugs occur in these execution environments, you're left with a complete gap in understanding.

It can be the opposite as well. Having a VM with good tracing and debugging facilities is very useful. Often you can do things wouldn't be able to with just plain compiled code.

Investigating is essentially a space particular term for critical thinking. Bugs are depicted as word problems.Since mental models are approximations, they are in some cases inaccurate, prompting unusual conduct in programming when it is created on top of flawed suspicions.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact