

Why debugging is all about understanding - staltz
http://futurice.com/blog/why-debugging-is-all-about-understanding

======
mhomde
Over the years I've become more and more convinced that debugging has done a
lot to develop my problem solving skills and analytical thinking in general.
It's like being a full-time modern times Sherlock Holmes with the crime scene
and tools neatly at your disposal.

Usually breakpoints and stepping through code solves things pretty easily but
some bugs can be real stumpers.

Here's some of the tools I tend to use:

1\. What has changed? What could have caused this bug to appear?

2\. Verify fundamentals: Is it actually running the code I think it does. If I
change something that should change something, does it change? If I break
something does it break?

3\. Verify Input: Make sure it's valid

4\. Verify output

5\. Timing issues?

6\. Slowly remove code "around" bug until it works again

7\. Or start back at something you know works and add back code until it
breaks.

8\. Break out the functionality into an isolated environment so you can test
it separately so you'll know if the problem is within the feature or a side-
effect of something else

9\. If your code makes it too complicated to see and find the bug refactor
until it isn't.

10\. Go to sleep / Go on a walk. Let your subconscious process it

and finally, lot's of bugs are made self evident by well structured and
refactored code, so don't get too busy working on the symptoms rather than the
root cause.

~~~
mattmanser
1\. Can you consistently replicate the bug?

2\. Are you sure you can you consistently replicate the bug?

~~~
dr_zoidberg
Those should have been 0 and 0.1 in his list. Agree with all 12 points. My
little grain of sand: use all the tools available to avoid bugs: a good IDE
and/or code linters/inspectors can work wonders on a codebase. I've seen
people still programming with "just a text editor" (not Sublime, vi, emacs or
any powerfull text editor, I'm talking little less than Notepad with syntax
highlighting) telling you "it's just I don't like IDEs" when asked about it.

~~~
yoz-y
I'd even dare to say that it is better to shove Vi/Emacs experience into a
real IDE rather than the other way around.

I really like Vim, I use it for almost everything, however when I am working
on a large C++ codebase I tend to use an IDE that has full support for visual
debugging, code navigation and so on. I am always amazed when some more
experienced programmers than me spend valuable time grepping files and using
GDB on command line.

~~~
klibertp
> I am always amazed when some more experienced programmers than me spend
> valuable time grepping files and using GDB on the command line.

I don't know about Vim, but Emacs works very well as a visual debugger, for
quite a couple of languages. It looks like this for C, for example:
[http://www.inet.net.nz/~nickrob/gdb-
ui.png](http://www.inet.net.nz/~nickrob/gdb-ui.png) (it's Emacs debugging
itself in this screenshot).

In general both Vim and Emacs can _become_ IDEs that are on par with other
offerings. For Python development, I worked with Komodo, PyCharm, Vim and
Emacs. With jedi[1] both editors get context sensitive autocompletion (for
some strange reason called "IntelliSense" or something by some) and "find
definition", "rename identifier" etc. With Rope, they both get nice
refactoring support. With Magit/Fugitive, you get the prettiest and most
functional GUI for git. With Speedbar/TagBar[2] you get classes outline. With
yasnippet/vim-snipmate you get configurable, programmable snippets/templates.
And so on and on - all that on top of largely superior editing model that is
being developed and improved for 30 years.

I worked with Komodo, PyCharm and a couple of other IDEs, then switched to Vim
and I didn't feel that I'm missing something. Then I switched to Emacs because
I wanted an editor that I could easily customize as much as I'd like, and VimL
just didn't cut it. There is nothing comparable to Emacs in terms of
extensibility, LightTable and Atom may get there with time, but I suspect it
will take quite a few years. No other IDE even tries to approach this level of
extensibility and customizability.

[1] [https://github.com/davidhalter/jedi](https://github.com/davidhalter/jedi)
[2] Vim version:
[https://i.imgur.com/Sf9Ls2r.png](https://i.imgur.com/Sf9Ls2r.png)

~~~
yoz-y
While you can indeed transform Emacs/Vim to a good IDE, there are lot of
(experienced) people who do not. Anecdotally I have seen quite a lot of posts
boasting use of 'vanilla' editors without plugins.

Regarding your GDB screenshot, that is certainly good, I would argue though
that using pure text (with some images included) is limiting oneself.

This being said, I really hope that NeoVim will go far enough and bring good
enough integration of Node.js and javascript in general (my current work uses
these technologies). So far I have been quite disappointed by stuff like
tern[1]

[1]: [http://ternjs.net](http://ternjs.net)

------
henrik_w
+1 for encouraging the use of logs as a debugging tool - they're often better
than a debugger in my opinion [1].

Also, bugs aren't only bad - you learn a lot from them too [2].

[1] [http://henrikwarne.com/2014/01/01/finding-bugs-debugger-
vers...](http://henrikwarne.com/2014/01/01/finding-bugs-debugger-versus-
logging/)

[2] [http://henrikwarne.com/2012/10/21/4-reasons-why-bugs-are-
goo...](http://henrikwarne.com/2012/10/21/4-reasons-why-bugs-are-good-for-
you/)

~~~
megaman22
I've found that it is often impossible to debug multi-threaded code in a
debugger. Hitting a breakpoint and single-stepping through code on one thread
smashes any other reads that are operating and expecting the thread that is
stopped to respond in a reasonable timeline. It can also cover up race
conditions, and just generally dork up anything that is time-related.

Printf debugging forever! ;-)

~~~
exDM69
If you have a race condition in a multi threaded program, a simple printf is
enough to screw the timings and make the bug disappear, it's not at all better
than breakpoints and stepping in a debugger.

In my experience, the only way to produce safe multi threaded code is to
isolate the threading parts and synchronization and stress test them early on.
At this point I use a combination of prints, random delays and simple asserts
on invariants that the code should hold. Most issues are reproduced within
about 10 seconds of 100% CPU core utilization. Some nasty corner cases may
require minutes of grinding away.

Multi threaded debugging is an unsolved problem. There are no universal
solutions to this problem. Conventional tools often fail so badly that it is
best to write and test your multi threading / synchronization code in
isolation, and then applying it to the actual workload which is tested in a
single thread.

Another option is trying to build a formally verified model ahead of time
(e.g. using Spin - www.spinroot.com) and then write the actual program after
the model has been verified.

~~~
AnimalMuppet
> If you have a race condition in a multi threaded program, a simple printf is
> enough to screw the timings and make the bug disappear, it's not at all
> better than breakpoints and stepping in a debugger.

If you have a race condition in a multi threaded program, a simple printf _may
be_ enough to change the timings and make the bug disappear. But a breakpoint
and stepping in a debugger is _guaranteed_ to completely change the timings
(unless your timings are on the order of minutes).

~~~
mannykannot
In concurrent software development, the third way (analysis) is the only way.
You need to know (or find out) which resources can be accessed concurrently,
and develop a theory of how this could lead to the observed error.

------
JustSomeNobody
Writing code is about understanding, also.

I've seen developers bang away at a problem for days and not be able to
describe how it "works" in a code review. They just tried every crazy thing
they could think of until it eventually returned results that looked correct.

Don't be that kind of developer. For one, all the other developers are going
to make you maintain that code for the rest of your life. Two, never be too
proud to ask for help. We can't know how to solve every problem on our own.
It's not possible.

~~~
userbinator
I agree, understanding certainly can't be emphasised enough - I believe that a
good programmer must always be able to understand enough to "mentally execute"
\- manually stepping through code, possibly with pencil and paper or a
whiteboard, and making sense of the results at each step.

There are some who argue against this, and their argument is effectively "if
the machine can do it for you, why should you need to know how to do it
mentally" \--- it fails miserably when debugging, precisely the time when the
machine _can 't_ do it.

I've heard it phrased thus: "If you don't understand precisely what you need
to do, what makes you think you can tell a computer how to do it?"

~~~
mwcampbell
But how can I mentally execute code if the whole point of the code is to
interact with a third-party OS facility or application that's closed-source
and therefore opaque? In that case, my mental execution of the code will
depend on a mental model of the third-party component that's probably
incomplete. Might as well just let the machine run the real thing.

~~~
userbinator
It does get a little more difficult when execution flows into code you didn't
write, but you can still validate your post/pre-conditions at those
boundaries.

------
ams6110
Also a couple of tips for the beginning programmer:

* You've almost certainly not found a bug in the compiler or libraries or language runtime (somewhat more possible if you're using alpha, beta, or dev versions of something).

* Your CPU, memory, or other hardware are not defective.

* You are not experiencing cosmic rays flipping bits randomly in your data.

* The problem is most likely to be a mistake in your code.

I have worked with programmers who are quick to jump at the most unlikely
explanations for bugs, and it's a very timewasting way to work.

~~~
sosborn
> * You've almost certainly not found a bug in the compiler or libraries or
> language runtime (somewhat more possible if you're using alpha, beta, or dev
> versions of something).

This should be in bold type on every page of Stackoverflow.

~~~
ionforce
But when you HAVE found a framework bug, it's both delicious and infuriating.

------
jmount
A quote I have always enjoyed: "Finding your bug is a process of confirming
the many things you believe are true, until you find one which is not true."
Norman Matloff 2002
[http://heather.cs.ucdavis.edu/~matloff/UnixAndC/CLanguage/De...](http://heather.cs.ucdavis.edu/~matloff/UnixAndC/CLanguage/Debug.html)

~~~
ajuc
It's not always 1 thing :)

Some of my "favorite" bugs happened when 2 things were wrong but 1 worked
around 2, and then someone fixed one of them.

------
bcantrill
While it's great to see more attention brought to debugging, some of this is
just (for lack of a better word) insane. e.g.:

 _For searching in space, most programmers have done binary-search with
commented code: comment out or bypass half of your codebase. Do that
recursively until you nail down where (at which file, which function, which
lines) the bug lives._

No, most programmers actually _haven 't_ done this for the simple reason that
it's highly unlikely to work: most codebases with half of their functionality
removed _don 't actually function_. That this kind of lunacy is being asserted
as authoritative is galling enough, but it gets worse:

 _When binary search doesn 't work, you need more creative approaches.
Consider brainstorming to enumerate even the wildest possibilities: for
instance, maybe the bug is in some external resource like a library or a
remote service; maybe there is version mismatch of libraries; maybe your tool
(e.g. IDE) has a problem; maybe it is bit flipping in the hard disk; maybe the
date and time library is sensitive to your computer's settings, etc._

This is advocating debugging by superstition, and it represents toxic
thinking. When we have defects in software, we need to be strictly empirical
in approach:

1\. Make observations.

2\. Think of interesting questions.

3\. Formulate a hypothesis.

4\. Develop a testable prediction from the hypothesis.

5\. Gather data to test the prediction.

6\. Refine, alter or expand the hypothesis.

7\. Go to step 4 until defect has been driven to root cause(s)

If this sounds familiar, it is because it is the _scientific method_ [1] to
which we can credit much of modern knowledge and civilization. As to how this
specifically relates to debugging, I touched on this in my recent DockerCon
presentation[2][3]. Bringing attention to debugging is terrific -- but
advocating superstition as a methodology is anathema to true understanding.

[1]
[https://en.wikipedia.org/wiki/Scientific_method](https://en.wikipedia.org/wiki/Scientific_method)

[2]
[https://www.youtube.com/watch?v=sYQ8j02wbCY](https://www.youtube.com/watch?v=sYQ8j02wbCY)

[3] [http://www.slideshare.net/bcantrill/running-aground-
debuggin...](http://www.slideshare.net/bcantrill/running-aground-debugging-
docker-in-production)

~~~
mwcampbell
I listened to your DockerCon talk on debugging. So in a server that can handle
multiple concurrent requests (e.g. a web application server), should each
unhandled exception abort the whole process (after delivering something like
an HTTP 500 to the client) so the developer can do postmortem debugging on a
core dump? That would cause all other in-progress requests to be aborted too,
unlike simply logging the exception and moving on. But then again, maybe that
would increase the motivation to root-cause every failure.

~~~
dap
I don't mean to speak for bcantrill, but yes, this is exactly how we (at
Joyent) build programs. We wrote in detail about why we do it this way:
[https://www.joyent.com/developers/node/design/errors](https://www.joyent.com/developers/node/design/errors)

It's not that this approach motivates root-causing failures (though that's
true). It's that uncaught exceptions are programmer errors, and it's
tautologically impossible to correctly deal with them. Attempting to do so can
make things much, much worse.

To make this concrete: I've dealt more than one production outage caused by
the mere presence of an uncaughtException handler. If the program had merely
crashed, a hundred requests may have been interrupted, but the program would
have restarted and resumed servicing requests. Instead, the exception was
thrown from a code path that had a database transaction open with database
locks held. Because the uncaughtException handler just logged the exception
and otherwise ignored it, that transaction stayed open (and the locks remained
held) until a human got into the loop -- interfering with tens of thousands of
subsequent requests. That's much, much worse. If the process had just exited,
the db connection would have been closed, the transaction aborted, and the
locks released.

An unexpected error handler can't know the complete state that the program was
in because by definition this wasn't a state that the programmer thought
about.

~~~
mwcampbell
Thanks. Now I gotta figure out the best way to do that in Python. Most Python
web frameworks and WSGI servers catch all exceptions and keep the process
going.

------
mannykannot
Understanding is the key to everything in software development. In particular,
understanding both what you are trying to do and the tools (including
language) you are using is necessary for writing code that works.

The former is true even if you are doing exploratory development: in that
case, you need to understand what has been found so far and what the next
iteration is intended to investigate.

------
scott_s
I wrote about the relationship between debuggers and logs a while back:
[http://www.scott-a-s.com/traces-vs-snapshots/](http://www.scott-
a-s.com/traces-vs-snapshots/)

~~~
TheCams
Totally agree with that. In my previous company we were always working with
dumps for asserts/crashes and when I started working in my current company I
was confused that they mainly use logs. Now I can't go back to debugging
without logs. Using dumps is like looking at the consequences, looking at logs
is looking at the causes.

------
k__
Debugging can be fun, too.

If often found some bugs in a reporting software, that showed the people wrong
numbers for years.

Or if something needs to be re-implemented, say for performance reasons, you
get the algorithms and notice that the old version wasn't only slower but also
wrong.

When this happens, I start to question if the people really "use" the software
in the first place. Or if they just get some wrong sens of safety from it,
which calms them, but the underlying system is mostly random and they could
simply make up their own numbers.

------
jorgeleo
If debugging is about understanding, then it will be good to use a language
that lets you embed the knowledge in the coding. Knowledge should be easier to
read from the code leading to a faster understanding, and more bugs are catch
by the compiler instead by the end user.

This is a great page (with a video) about what I mean

[http://fsharpforfunandprofit.com/ddd/](http://fsharpforfunandprofit.com/ddd/)

------
brudgers
It's an interesting and helpful article, the downside is that it perpetuates
the use of the non-technical language of "bugs" and "debugging." Sure it's
accepted jargon, but like most jargon "bug" doesn't really convey much of
anything beyond a person's attitude. "Bug" is as unscientific as "weed"...a
bug may be the wrong color text in some HTML or as in its origin myth,
something preventable with a can of Raid.

Fault/Error/Failure provides a better language for talking about systems and
their processes. It provides a diversity of solutions from changing the source
code, to handling an error, to just letting it crash...the last is impossible
to cast as a solution in the nomenclature of bugs.

There are times in which there the ambiguity of "bug" matches the necessary
ambiguity of the context...just as there are times when describing the sun
moving across the sky is useful. But the geocentric cosmological model breaks
down when we're interested in predicting the location of Jupiter in the night
sky.

------
rumcajz
Just add asserts to your code. Make an assert for any assumption you make no
matter how trivial or unlikely to be broken. And don't disable the assert in
release versions.

That can, as a very conservative estimate, cut down time spent debugging by
50%.

Life is too short to spend it debugging undefined behaviour.

------
cpfohl
Using logs as a debugging tool is only step one.

The second step is using a good tool that helps you make sense of those logs,
and includes good debugging information with the logs. A tool like Rollbar
(disclaimer: I almost work for Rollbar) makes it super easy to analyze
patterns in your errors and logging, find out who experienced the error, and
to hear about the bugs before your customer, who may have become used to them.

Anecdote from my previous employer: we had a terrible piece of legacy software
that regularly had modal pop-ups warning of errors this or that that doubled
the amount of time our clients took to do stuff. They were so used to it that
instead of reporting the error, they just dismissed them.

------
cjslep
Tangential to the topic: The git bisect image has "good" and "bad" backwards,
showing the solving of a bug instead of the production of one. I would hope a
commit in between would clearly state it contains the bugfix.

~~~
staltz
Fixed, thanks!

------
cognivore
While understanding if of course important, sometimes it's possible to just
"zen" the bug. After working with the same codebase for long enough I've been
able to see behavior and think, "That's probably down in the login handling
code..." and then start there. I don't necessarily have to understand the
login code, but it's gnarly, and I've seen similar bugs before from it. I'm
right more often then I would expect. When the zen fails it's off to the
methods mentioned in the article.

------
aalhour
Thank you for sharing this. I learned using logs to troubleshoot and track my
bugs the hard way. It was definitely a good read.

------
mnw21cam
The Kernighan quote is absolutely key, and is one of the many email signatures
that I use.

~~~
userbinator
An interesting interpretation/rebuttal of that quote:

[http://www.linusakesson.net/programming/kernighans-
lever/](http://www.linusakesson.net/programming/kernighans-lever/)

~~~
agumonkey
Hmm delicious image [http://www.linusakesson.net/programming/kernighans-
lever/flo...](http://www.linusakesson.net/programming/kernighans-
lever/flow2.png)

Worst times are when you jumping around the flow area.

------
chrisseldo
what tool is he using for the sketches

