I rarely enter an interactive debugger. I have TONS of logging statements I can toggle. I make program execution as deterministic and reproducible as possible (for example, all random numbers are generated from random number generators that are passed around). When something goes wrong, I turn on (or add) logging and run it again. Look for odd stuff in the log files. If it doesn't make sense, add more logging. Repeat.
I worked on a pretty large videogame in the 90s where /everything/ was reproducible from a "recording" of timestamped input that was automatically generated. The game crashed after half an hour of varied actions? No problem, just play the recording that was automatically saved and attached to the crash report. It was amazing how fast we fixed bugs that might otherwise take weeks to track down.
I forgot to mention that we actually found a lot of bugs by seeing a playback diverge from the original recording; this was often due to uninitialized variables or reading from random memory. We could generally see when the divergence happened because we computed a checksum of the state of the world every frame and stored it in the recording as well.
I wonder how/if this could be applied to network-facing software like a daemon, or programs that transform large amounts of data.
What is a debug statement, anyway? You're checking the state at a point in the code. That's exactly what a unit test assertion does... except that a unit test calls a single method/function... meaning the code you're trying to debug needs to be split up and written in a way that is easily unit-testable, with few, if any, dependencies that aren't explicitly given to it... which makes it better code that is easier to reason about (and thus results in fewer bugs).
See where I'm going with this? TDD = say (mostly) goodbye to "debugging"
I could say "the time I spend debugging has dramatically decreased since I began proving each bit of code to be correct mathematically." But that tells me nothing about whether it is actually a better approach.
I suspect that's why you're getting downvoted: the comparison is naive. (Edit: Also responding to 'how do you debug' with 'I don't' probably doesn't help).
My personal anecdote - I don't spend much time debugging. I spend a lot of time thinking, a smaller amount coding, and a relatively small amount debugging. Spending, say, 20% extra time preventing bugs before they happen would not be cost effective for me.
tl;dr If you consider a 90% reduction in bugs (and debugging, and unpredictable amounts of post-production debugging time, etc.) worth a 15-35% extra cost in upfront (but more predictable) development time... then you should be doing TDD, full stop.
If you can't figure out how to apply TDD to the problem, then look at the problem again. I/O is a common tripping point. Watch Gary Bernhardt's "Boundaries" talk for ideas: https://www.destroyallsoftware.com/talks/boundaries
I repeat... a 90% reduction in bugs. I think that's pretty damn huge. Read the paper, then google "TDD empirical data." Actually, I'll do it for you: https://www.google.com/search?q=TDD+empirical+data
Trust me, I am all about doing things rationally and staying away from faddy things that have no data to back them up, but in this case, this one seems pretty solid.
Personal experience is that the code I write using TDD is better along almost all dimensions that factor into determining what "good code" even is: More modular, fewer dependencies, easier to maintain, easier to debug (when you must), smaller and more focused methods/functions, better modules/classes... These more intangible perks literally fall out of the sky like manna if you code with TDD.
It's admittedly tough to change to this habit, but now that I've been doing it, I would not go back. At all.
Thanks for the patronising 'let me google that for you'.
I'm glad you saw the light, I'm content to remain a pagan.
> If you consider a 90% reduction in bugs worth a 15-35% extra cost.
I don't spend more than 31% of my time debugging, so 35% for a 90% reduction is nowhere near useful. I rarely spend 13% of the time on any bit of code debugging. And when I do I write unit tests as a debugging strategy, so it wouldn't have saved anything to write them beforehand.
If you find yourself debugging for 30+% of your time, on new code you write, unless you write tests beforehand, then I respectfully suggest there are more pressing things to worry about in your practice.
I mean, there's a rational explanation for it: Your code produces states. What is a bug? An unexpected state. As long as you can contain/model ALL of those states, in your head, at all times, while working on the code, you can potentially write 100% bug-free code, at all times. But as soon as you cannot, then the tests will help contain the many possible states your code produces, and that your brain can no longer contain.
And unless you are infinitely genius, your code will eventually reach a point where you simply cannot model all of its potential states in your head (this goes manyfold on a programming team where everyone is sharing bits of the "mass state" the code can produce.)
Speaking of controlling unpredictable states and therefore bugs, FP often comes into the conversation as well, here's John Carmack on that topic: http://gamasutra.com/view/news/169296/Indepth_Functional_pro...
If you start thinking of TDD as specifying what the code should do, then you probably should be writing tests for your tests to ensure they test what you mean for them to test. And if your code is compartmentalized into small enough components, it is just as easy to write a correct test as it is to write correct code. If you are doing that, then writing tests is clearly a waste of time. (And you should be doing as much of that as possible.)
In my experience, proponents of TDD often end up actually defending testing when pushed, rather than TDD. As if TDD or write-and-forget code are the alternatives.
> unless you are infinitely genius
We've established, based on your own numbers, that you only need to spend less than 10-30% of your time debugging your code before it isn't worth it. There's no need to exclude the middle and pretend it requires infinite genius.
I've noticed that TDD forces me to think about the code in a better way, before actually coding it, than just going ahead and coding it and then figuring out how to test it after.
This is by no means an easy practice to adopt, btw, especially after years of not doing so.
I actually think TDD should be taught in schools and never even considered a separate aspect of coding; TDD should be integral to coding, period. If it wasn't discovered separately in the first place, it would just be called "coding."
> that you only need to spend less than 10-30% of your time debugging your code before it isn't worth it
That is a good point.
It's worth it down the line. You're not factoring in future costs. You're not just reducing TODAY'S bugs by 90% with that increase in overall coding time, you're also vastly reducing the technical debt of the code in the future.
You're also writing a permanent, provable spec for the code. What happens 5 years after you write your untested but debugged code and have to go back to it to add or fix something? How in the hell will you remember the whole mental model and know if you are in danger of breaking something? The answer is, you will not. And you will tread more useless water (time and effort) debugging the bugfixes or feature-adds or refactorings.
Speaking of refactorings, they are almost impossible to do without huge risk unless you have well-written unit tests against the interface to the code being refactored.
In short, if you do not write tests, you are literally placing a bet against the future of your code. Do you really want to do that? Do you consider your work THAT "throwaway"?
That said, tests are no panacea... and I am NOT trying to sell them as one (which would be wrong). You might assert the wrong things, or miss testing negative cases (a common one is not testing that the right runtime error is thrown when inputs to the code do not conform). There are cases on record of well-tested code that has passed all tests (like satellite code) and still fails in the real world because of human error (both the code AND the test were wrong).
IMO, that is the crux of the matter: Thinking-Driven-Design is the way to go. The idea that you _need_tests_ to do the up-front thinking is, again IMO, bogus, and writing tests without thinking doesn't help much, as you seem to agree on with your remark on missing test cases.
Some people use paper or a whiteboard as things that make them think. Others go on a walk. Yet others, sometimes, can just do it sitting behind their monitor while doing slightly mindless things such as deleting no longer important mail, or setting up a new project.
Also: good tooling makes many kinds of refactorings extremely low-risk. Strongly-typed languages with that are designed with refactoring in mind help tons there.
I have been struggling for a while to integrate testing into my work. TDD is easy and perfectly suited for "normal" software development, where there is some kind of plan, and you are writing a lot of code that changes little after it is written.
I'd be very interested in pointers on how to apply TDD to little code that changes a lot after it is initially written.
One thing testing does require is determinism. In other words, given all input states X(1) through X(N), output Z should result 100% of the time. If that is not the case, then you haven't accounted for all input states (for example, code that looks at the time and acts based on that- the time is an oft-unaccounted-for input) or you have code that calls something like rand() without a seed.
If you can get your code into a deterministic state, then it is testable.
You are preaching by the way.
Going to quote one of the authors (Hakan Erdogmus) from here: https://computinged.wordpress.com/2010/11/01/making-software...
"I am one of the authors of the TDD chapter. Ours was a review of existing some 30+ studies of TDD. Yes, Nachi’s experiments (the MSR study mentioned above) were included. BTW, I wouldn’t have concluded that “tdd doesn’t work” based on our data. Rather, I would have conservatively concluded: there is moderate evidence supporting TDD’s external quality and test-support benefits, but there is no consistent evidence in one direction or other regarding its productivity effects."
The only thing inconclusive was the increased productivity effect, not the code quality effect.
> It's especially not clear whether it's worth it to do TDD in comparison with simple automated testing and other testing strategies.
I've done every testing strategy. Test-none, BDD, TDD, test-after, integration testing, unit testing, GUI testing, you name it.
If your code will last longer than a couple years, it is worth it to do extensive unit testing via TDD, integration testing of major/important workflows, a visual once-over by a QA person to make sure something didn't break the CSS, and that's it. If you are the founder of a startup and expect to sell within 2 years (and are thus incentivized not to go the TDD route), you better be correct in your bet or there will be technical-debt hell to pay.
Espousing something that I have found over and over again in my programming career (testing since about 2004, TDD since about 5 years ago) is now "preaching"? Call me a prophet, then. I know exactly what I see. Don't listen to me though, I've only been coding since I was 8 in 1980... I encourage you to find out for yourself.
You on the other hand seem to be certain that it works. So certain that you're using no qualifiers, writing lengthy replies and selectively providing links that support your assertions. When pressed for information you provide a reasonable defense of unit-testing not TDD.
Yes, I know unit-tests are nice when part of a testing strategy together with integration testing, UI testing, etc. I am not at all convinced that TDD is better and you haven't changed that.
I am ignoring personal opinions and blog posts because too many software engineering practices are just popular rituals. TDD proponents need to prove conclusively that TDD is significantly better than selective test-after to offset the productivity loss, and they haven't done that.
TDD forces your code to be tightly focused. It's very hard to write a test for reams of functionality before you write that functionality, so your code is automatically tight (and as a result, easier to refactor, maintain, understand, etc). I don't see how this is so hard to see or why you need empirical evidence for that part at least. A lot of what "value" is is quite subjective, even in programming. You know "good code" when you see it. Why don't you try TDD and form your own opinion?
I used to be a Visual Studio Debugger Wizard (BTW, it's an excellent debugger)... now I don't remember the last time I used a conditional breakpoint.
Working on a codebase designed, from the start, for testability, changed everything. So I totally agree with your last sentence about TDD ; although it took me nearly one year of practice before I could write solid unit tests (which wouldn't break every now and then because of some interface change), and that I still find it hard to write them before the code being tested (however, I do find it harder (impossible?) to write a unit test for code written more than one month ago).
I still use cgdb from time to time, to quickly print a backtrace or the origin of an exception/segfault.
By the way, I have the feeling that language constructs like lazy arguments, RAII, scope(exit), concurrency, and exceptions, make following the flow of control less and less relevant. In the long term, some amount of emancipation from one's debugging tools might be strategic.
Now it's a week later and all of a sudden my function is returning -10e8. Where does TDD help me with debugging?
Sometimes debug printfs out a UART or USB if my system lets me, sometimes I'll create a log in memory if I can't printf or if the issue is timing sensitive (which a lot of my work is).
Pen & paper end up being very useful too - often writing out what I'm thinking helps me figure out what's going wrong faster than poking around the code or circuit board.
Sorry for fanboying out, but it's the first time I hear of them outside of work and I was pleasantly surprised to see them mentioned.
As a Python/JS developer a few print/console.log statements are usually all it takes for me to figure out what's wrong with something. For more thorny situations there's always PDB/chrome dev tools.
At the end of the day, the people who are the best at debugging things aren't that way because of the tools they use. They're the best because they can clearly visualize the system, how data is flowing through it, and where potential problems might arise. Intuition from dealing with other similar problems also helps.
But I can't give that to people through a comment on HN, so I stuck to tools.
Debugging for me is about my using my brain to step through code, not some fancy IDE that stops me from thinking. It wasn't always so easy though, but the first step is to stop using big tools to help you.
Also, sometimes even just using console.log can cause bugs to appear or disappear. I recently encountered a bug which was almost impossible to diagnose with console.log, because the string returned by the .toString() call didn't correspond to the real object's actual properties. Of course, this is a rare case, but it highlights the benefit of trying different tools!
For any object foo:
type(foo) # show an object's type
dir(foo) # list an object's methods
help(foo) # pydoc
id(foo) # show an object's memory address
foo.__dict__ # show an object's internal data structure
import pdb ; pdb.set_trace()
Granted, often those moments are cases where the code is working correctly but you misunderstood or misremembered things, but the fact that you identified (and resolved) the disconnect is valuable, particularly if you're doing a deep-dive to figure out a nearby problem.
>They're the best because they can clearly visualize the system, how data is flowing through it, and where potential problems might arise
This also applies very well to appsec/vulnerability finding.
(1) Just add debug statements near where the bug is happening. These print a string, and the name and value of variables. Printed values in Lisp are human-readable, not just a pointer.
(2) Trace selected functions. This outputs the function name, arguments, and return value on function entry and exit.
(3) Use the virtual machine debugger. I can set breakpoints, display the stack, and trace VM instructions, but it's most useful for printing out disassembled compiled code.
(4) Browse data structures in Firefox. While my Lisp is running, a web browser runs in another thread, and every symbol has its own URL. Data structures are displayed as HTML tables.
(5) Unit tests. I've used these to debug complex algorithms, e.g. for event handling, and type inference.
I don't have to recompile and run the code in debug mode to do this.
Lisp has symbols, which are more than just variables. You can browse starting from any symbol.
I've never read about specific methods on how to do this "by the book". I'm usually doing ( for NodeJS ) :
1. Set a breakpoint inside my NodeJS 2-3 lines before the exception that I got.
2. Run debug mode
3. Do what I need in order to reach the breakpoint
4. Analyze the variables inside ( via watch ) or run code in that function ( via console )
Helps a lot more than `console.log(some_var_before_exception);` :D
Would you be able to debug something without these tools?
Do you think potentially these tools abstract some of the work away from you?
Genuine questions, just interested
When you actually have an exception and you put a `console.log(some_var);` in your code and then you reach it. Your next step is usually to fix your code and run it again. This time you see the corrected value in your log. Easy fix.
If the problem, though, is caused by some other part of the code, then you need to move the `console.log` statement up in your callers' functions until you see where the problem is. That sucks.
Now let's see what happens when you use a debugger :
When you reach that point, you check your variable, see that there is something wrong with it, check the caller, step inside the caller and manipulate the code, until you see everything is where it is supposed to be. One shot ( run ) fixes all.
Now straight to the point :
> Are you sure it helps more than console logging?
Until you get used to the debugger, you will ask yourself this question. Console logging is bad for recursive functions, loops, huge variables, etc. How many times have you write : `console.log(var);` and then one run later `console.log(var['some property'])` and even go deep?
> Would you be able to debug something without these tools?
Sure :) Sometimes I don't use it at all. e.g. when I'm doing a quick thing and use Sublime Text instead of WebStorm.
> Do you think potentially these tools abstract some of the work away from you?
I haven't run any benchmarks. Maybe I'm a bit slower with a debugger for 90% of the exceptions, but there is always an exception, hard to be console.logged, that compensates all that time.
As a counterpoint to this, I was pretty much exclusively a debugger guy for the first 10-12 years of my career. Now I'm pretty much exclusively a logging kind of guy. Familiarity with a debugger has nothing to do with it.
> Console logging is bad for recursive functions, loops, huge variables, etc.
And debuggers are bad for focus/blur event handlers in UI code because they change the focus state of the thing you're trying to debug.
Ultimately, neither of them is perfect, not all problems are alike, one is not an objectively better tool than the other. They both have merits.
I have a strong preference for having a debugger so I can set break points and testing assumptions in a REPL at that point in the code though.
I'd say it's pretty much required to do PHP (and especially Wordpress).
On that note, WP Core team, would it kill you to extend some basic array functions to WP_Query objects?
Only after I've grappled with these questions will I move onto log analysis, printfs, the debugger, data fuzzing, etc.
this 99% of the time
I use print statements > 50% of the time, but certain problems are better suited to the debugger. Especially if its code that I did not write.
Debuggers are great, but the knowledge gained by using them to solve a problem is completely lost once that close button has been pressed.
Also if I'm having to use a debugger to work out what's going on, usually it's a good sign my code is overly complicated...
If it's something i think is trivial i'll just use a few print statements. This is 90% of the time.
If i end up with too many print statements then i step into the debugger. Others scoff at debuggers, which is odd because they can be powerful tools. Maybe you only use it once every couple of months, but they can be very helpful. When you're waist deep in the stack, or dealing with action at a distance, trying to fix something in poorly factored code, want to watch a variable, think there's some weird timing issue, need to step down into libraries beyond your control, then debuggers can help.
Don't think of the debugger as a debugger, think of it as a REPL. You just happen to be using the REPL with buggy code.
That's a great analogy.
"You just happen to be using the REPL with buggy code."
Despite the name "debugger", it's not just for buggy code. A debugger can be a very useful tool for understanding how someone else's code works.
If I can narrow it down to what line, or even file, is throwing an error I just take a few minutes, read all the code and all the code of branching methods, and then can narrow it down to a single line.
From there it is actually developing a fix. As you mess around with more and more languages, you will notice that most compilers lend something far from a helping hand.
This only works, and I will stress this, for programs under 1 million lines. Past that mark, you need to do some extra steps.
When I debug one million line projects, I narrow it down to a file. I pull out the code from the file, and I mock all of the methods that file calls (This gets REALLY hard with networked code. Trust me). From this small subset, I slowly break the functionality of the external method until I resemble the error being made in the main project. From that I now know the method(s) that are actually causing the problem.
But, there is one thing that this makes an assumption about: your compiler is working.
Put blatantly, they're crap.
Usually they won't show you the correct file causing the error or they will not generate a helpful error. Runtime errors are even worse.
The best thing to do is avoid making the tricky errors. Make unit tests, using fuzzing tests, and well made documentation.
Documentation alone, that details all of the possible output and input states of the function will save you days on some bugs.
In Java, the @Nullable tag is a godsend. Use these features, they WILL help.
If you do tests, fuzzing, and documentation.
Using your brain and some things to make your brain's job easier will make you faster at debugging then any debugger like your buds GDB/DDD setup.
So imagine doing things like narrowing down execution to just before and just after your error, then taking snapshots of the runtime memory and diffing the objects. Or a conditional breakpoint that changes the class of a particular instance to a special debug class.
You can do many of the same things in compiled languages, I've since discovered, if you have a decent incremental compile set up, and you use some tactical thinking. But the environment always seems like it's trying to get in your way. (As opposed to a good dynamic environment, which seems more like an eager golden retriever wanting to play more fetch.)
1. Reproduce the bug as consistently as possible.
2. Find a pivot for the bug. Whether this is a previous commit where the bug did not occur, or a piece of code that can be commented out to clear the bug, I need to find some kind of on/off switch for the behavior.
3. I comment out code / insert break points / use git bisect to flip the switch on and off, performing a kind of binary search to narrow the scope of the issue until I have it down to one line or method.
4. Once the source is found, read the surrounding source code to gain context for the error and reason toward a solution.
Of course, this works best if the bug originates from a single source. Sometimes this is not case and there are multiple variables interacting to create the undesirable behavior. That’s when things get really fun :)
When debugging difficult, intermittent problems (e.g. non-repro crashes) my strategy is to keep a log of when it occurs, add lots of diagnostics and asserts around where I think the problem is, until hopefully I can catch it in the debugger or notice a pattern.
90% of the work of debugging is creating a quickly reproducible test case. Once you have that you can usually solve it.
Being able to quickly reproduce the bug time and time again makes a big difference. Some permanent verification that it's actually fixed (at least in the given case) at the end of the session is also nice and adds a lot when doing a major refactoring or something similar. Especially for bugs related to the domain specific requirements, rather than the technical ones.
It depends strongly on the circumstances of course.
I remember quite a few times sitting next to someone trying to debug something, asking something like: "So are we sure that parameter there is correct?" ... they'll say "Oh yeah, that's definitely not the problem" ... fifteen minutes later, after bashing our heads on the desk a bit, we actually check that value: "Whoa, what?! That's impossible!"
in the olden days when i used ide's like visual studio or netbeans, i'd often times leverage their native debuggers to set watchpoints and step through code. but those days are over, now i mostly use interpreted languages like python, ruby, and compiled languages like golang (highly recommended). print statements are the way to go, especially if you're writing server side code (restful apis, websockets, etc), you'll want the log information as you won't be able to attach a debugger to a production system.
just a random thought based on this topic, if debug/log/print statements were detailed enough, one could actually take a log file and write some code to parse that and transform into test cases in your favorite test framework, that may have some effect on saving time writing test cases. for production bugs, you could take the log output as the reproducer steps to generate test cases to cover this.
and i really liked the comment about tdd and more importantly unit testing, it's critical and helps developers better organize their code.
It's actually my first line of defense, and then after that printf statements and then gdb + frama-c.
One really nice tool is
frama-c -cg <files>
However I do use gdb from the command line on occasion. Code I write is pretty heavy on global variable and with gdb you can poke about and see what they are. You can also use gdb to look at internal processor modules.
To get around the limits of not being able to use break points I have a command line interface built into the firmware, which I use to poke and prod for debugging. I'm dimly aware that almost no one else does this, but can't for the life of me figure out how people get by without it.
I also have a critical error handler that can save information off to a no init section of memory and then reset, recover and then log the error via the serial port on startup and via the radio. This is useful because for instance I have that hooked into the bus fault interrupt, so I can pull the offending instructions address off the call stack. The binutils program addr2line.exe rats out the offending line of code about 99% of the time.
For timing related stuff I make heavy use of toggling port pins and watching what happens with an oscilloscope.
For networking stuff sometimes I use wireshark.
For C#/.net development I use Visual Studio and logging either to a window or to a file. However I've noticed that when other programmers work on my code they immediately delete that stuff and switch to printing to 'stderror'.
Set a breakpoint in the code, refresh the browser, and all the variables in the scope will be annotated with their value at break time.
This is really what you're after when you're println debugging - it has the advantage of showing you everything in a minimally intrusive way which is helpful when you don't know what you're looking for exactly.
IntelliJ is a pretty complete suite of a tools, a pleasure to use (has VIM mode too :P)
In Python I mostly rely on print(f) debugging, especially when working on multiprocessing programs, which I do rather frequently. With multiprocessed Python, pdb is useless. pdb is great, but not for a multi-process program. Most of my issues are related to sub-optimal API documentation that fails to point out argument types, and find I do a lot of trial-and-error programming, for which printf’s are great. Occasionally I drop into `import pdb; pdb.set_trace()` to inspect objects or the REPL to try out ideas.
In Perl and shell, which I use ever more infrequently, the equivalent of the printf debugging is the norm. The only language I have found the debugger to be the first choice has been Java.
With Rust, I find myself resorting to a debugger surprisingly seldom. Its type system and the safety checks in the compiler catches most of the mistakes I make.
I don’t do much C programming anymore, but if I did, I would be interested in using rr (http://rr-project.org/), which allows you to record a run of the program and replay it _exactly_ as it was the first time, essentially letting you reproduce and fix race conditions.
But not all the bugs are easy to solve. The worst kind of bugs are hard/impossible to reproduce. For them my approach is to suspected places where it could occur and add logging, and just wait until it occurs again(sometimes it takes weeks or even months). So I am trying to log how the data flows through the system by isolating the direction to look for.
For example: I'm yet to find cause of the mystery of the MS Word and TinyMCE, where sometimes it will not strip out microsoft's formatting. It only occurs about once a month. I wrote a simple alert script which sends me an email when this happens and I can get to the person in seconds(most of the users are in the same office), and try to reproduce the exact moment when it occured on users computer.
My fix was just show an error asking users to repaste exact same text which then works as expected.
But I think IDE's can't beat real-time debuggers, like console.log or Ruby's bettererrors gem, having full real-time access to application logic/code at the spot, you can't beat that.
For most bugs I look at, I usually wish that Linux had DTrace. I can't tell you how many weird bugs I've found where finding the solution would've been debugged in 20 minutes of DTracing. For example, I'm currently investigating a weird memory leak in Docker that actually appears to be a reference leak in the Go runtime. It took me several days to figure out it was a bug in the runtime (if I had DTrace I could've found symptoms of the issue much faster).
But in general, most of the bugs I find can be debugged with some print statements if I can correctly make an educated guess where the bug lies. For other sorts of issues, unit tests or creating minimal test cases works pretty well and for everything else I'll jump into a debugger if it's really necessary.
Logging doesn't give you anywhere near the power the a good debugger does.
Statically-typed language? Dynamic? Makes a difference.
Point is, 90% of whatever answers you receive here will not be well-suited to your particular situation.
Probably says something about my coding practices that I've gotten good at it.
If you do that right, you can skip the reading and go right for the beef. Breakpoint.
Last but not least - good architecture, worst heisenbug wont cost you as much as the smallest architecture error.
But since I work on consumer desktop software, occasionally a customer will encounter a problem that I can't replicate on my dev machine, and their bug description doesn't help me locate the problem. In that case, I try to duplicate the customer's system as much as possible: I have Parallels Desktop & VM images of almost every version of Windows, and I can configure the amount of RAM. Sometimes it's easier to ask the customer to run a special Debug build, but if they're not very tech savvy, Parallels can often help me reproduce the bug myself.
Snapshot debuggers like Google Cloud Debugger are probably the way forward. Alas it doesn't support Python 3 yet.
Desktop apps w/ C/C++ -- IDE based debugger (Visual Studio, GDB w/ a front end, ADB) print / logging statements
Embedded C -- IDE Debugger (Green Hills, ARM DS-5, IAR, Eclipse w/ remote GDB, etc) GPIO ports, logic analyzers, oscilloscopes, etc)
Apps -- ADT / ADB, print / logging statements
Python, bash, sh, .bat scripts -- print / logging statements
As many others have mentioned, having a consistent bug reproduction methodology is vital, a strong mental model of the SW and its various components is important, and a willingness to dive deep and question assumptions is critical. Ie don't always expect your compilers or various OSes to be infallible.
I also output my code to .dot format to visualize the flow of data quite a bit. This is extremely useful in statemachine like stuff - I basically can create animated gifs (well, not really, it's more of me pressing right on my image viewer) to watch how my programs execute.
Why not Delve? I do actually use Delve. Just not as much as logf()s and panic()s.
Hmm.. in retrospect, I also dump a lot of data (neural network weights for example) into csv for debugging
That being said, I'm not trying to take anything away from log-based debugging, there have been many times when log-based debugging has saved my bacon, but it feels strange that there is almost an attitude of interactive debuggers being "lesser" in these comments.
When you have a choice between a primitive tool which will definitely work and a sophisticated tool which may or may not work, sometimes you just want to get on with it, and whatever time you have to spend thinking about your tools is time you can't spend thinking about the problem you're actually trying to solve.
I once debugged a bootloader issue on an embedded device with no feedback but a single blinking LED. It took a while, but I kept trying different things, "logging" the program's output via patterns of blinks, and staring at that flashing light through test after test eventually told me what I needed to know.
On the other end of the scale, during the minute or so I worked at Google, the "library" I worked on that various "applications" would "link against" was actually an armada of independent processes on separate machines in a giant datacenter firing reams of HTTP requests and responses at each other. Stopping one of them and interrogating it via debugger would have been about as informative as the conversation I'd have with an ant if I caught it in a jar and asked why its friends decided to have a party in my kitchen.
Between those extremes, there undoubtedly exist many places where interactive debuggers would be useful; but having spent enough time unable to count on them, the effort it takes to use such tools grows increasingly difficult to justify.
I do use the perl debugger though when I am writing perl and when there is a need. The benefit to this debugger is that it comes built into the language.
Then just put a conditional breakpoint and wait for it to confirm the error and break. Once there I can probably reason backwards to an earlier state that led to the error condition such as "an item must have been removed from a placed order" so again a conditional breakpoint in the remove method with the condition that the order is already placed. Rinse repeat.
One thing I have found helpful is to compile the code using different compilers and on different platforms. A bug that is hard to reproduce on one OS can become deterministic on another OS.
- Can you reproduce it? (locally)
- No? Then can they reproduce it? (remotely)
- No? Then can you follow the flow byte-by-byte by just looking at the code? You should.
If you can reproduce it, great, you can most probably brute force your way into the cause with local monkey-logging or step-by-step debugging.
If a customer can reproduce it then you may have a shot at remote debugging, injecting logging or requesting a dump of some sort. That's why it's important for an app to have good tools built-in so a customer can send back useful debug info.
If you can't reproduce it, then give it a shot at following the flow byte-by-byte. Either mentally, with test cases or a combination of both. Here's a quick guide from the top of my head:
- determine if there are black spots where the variable, stack, heap etc. could have unexpected data or your assumptions could be wrong or your understanding of the language, library or any technology supporting the logic could be incomplete or needs a reread of the manual.
- order your black spots by probability, starting with the most vulnerable code related to the bug (ie, for that infinite loop bug the recursive function tops the rank for weak spot)
- now compare the bug symptoms against such vulnerable code to check if there's 100% match. That way you make sure all symptoms can be caused by the alleged culprit.
- do negative symptom match also, thinking of symptoms that would be caused by that fault and make sure they can be observed (ie, the recursive function writes zeros to a file beside looping forever - did it happen?)
- if there's more than one possible cause, apply Occram's razor: the simpler one, with the least assumptions, although unlikely, is the cause.
- if no possible explanation exists still, start over with less moving parts.
- if a vulnerable fragment as been identified, but no concrete cause or solution found, rewrite the code for robustness, with plenty of assertions, complementary logging and clear error messages. This is a good practice every time you revisit code it should come out cleaner and more robust than before.
It is the non-deterministic bugs that drive me crazy. I have one bug where a call to a third party library randomly fails but only after the program has been running for days (no it is not a memory leak). If I make a cut down stub then the error never occurs even after running for a week. My best guess is I am trashing memory somewhere, but under valgrind everything is fine. Arg!
I've also gotten into the habit of building SDL projects in Visual Studio to use the console so I can just dump whatever I want into it.
I'm probably an example of what not to do in many cases, but still get the job done.
std::cout << "got here" << std::endl;
gdb on OS X is such a horror show.
If there is no Exception and there is som kind of logic bug.
I search my way to the logic code with the issue by greping and reading.
When i have found the approximate location of the issue i will probably set up some var_dump's or log.Println in the code to get a better understanding of what is happening.
After that it is usually a done deal.
It lets you run your tests on node.js, so your test runner does not have to start a browser etc.
I'm interested about FalcorJS, does anyone use it ? I found it very interesting, check this out for more at: https://reactjs.co/2016/02/03/what-is-netflix-falcor-and-why...
Anyone who'd like to learn a more systematic debugging process should take it.
Windows during development: Visual Studio.
Windows prod: DebugDiag to get the dumps, combo of VS, DebugDiag and the various native debuggers for analysis. Dumps created by the user or the system are also input.
Windows event tracing is also absolutely fantastic IF you have it.
I feel no shame.
Next most important thing is the network requests tab-- seeing what data is coming back from the server, if any, is indispensable.
If I'm debugging minified code that we haven't set up source maps for yet, I'll find a string literal in the source to match up the minified code to unminified code so I can see what I'm doing anyway by looking back and forth.
When I have to reproduce a bug, I often use the FormFiller extension for Chrome to quickly navigate our forms without having to fill them out.
I use EditThisCookie (another Chrome extension) to modify or view the cookie as I work, or to delete it to start a session from scratch. I don't like Incognito mode because I don't have my extensions and it doesn't maintain breakpoints when the tab is closed and reopened.
With regards to the call stack, being able to black-box certain scripts is awesome. You can right click a script in the sources explorer on the left side of the DevTools and black-box it, which will stop it showing up in the call stack. No more jQuery / Angular / Underscore calls cluttering my callstack!
What else...whenever I'm debugging CSS I always just do it in DevTools so I can see changes on the fly to figure out the problem.
I also used to use the handy "debugger" statement now and then, although I use it less and less these days since it's the same as a breakpoint but takes slightly more effort. Mostly only use it when I already have the code open in my editor and don't feel like manually finding that point in execution in the DevTools....it's kind of like "go-to this line."
Ctrl+P in sources of DevTools allows fuzzy search among files. Which is awesome.
There have been times I've used the XHR breakpoints, Event Listener breakpoints, and DOM breakpoints, but it's really rare for me. Definitely though there are cases where I'm not sure where an event is originating from and these have very much come in handy at those times. Underneath those on the right of the sources you can also see a total list of all active event listeners, which is also nice.
I'll add more thoughts if I think of anything else...I guess I'm mostly talking about my tools here. With regards to my thought process, that's more complex...hmm. I guess I try to figure out what the desired behavior is, try to see what the actual behavior is and how consistent it is, then see if I can find the code that controls that part of the application. If I don't already know, I Inspect Element on the page, find an element id or something, then look it up in our code and follow the trail to whatever's controlling the page and relevant behavior. From there it's just careful examination of the logic to find the flaw, using all the tools above.
As a side note: I find the "debugger" statement does not always trigger properly, making me manually set a breakpoint anyway.
Usually, my goal then is to either:
1) Find a configuration / infra issue we can solve (best outcome for everyone)
2) Give the most info to dev to enable a code fix, and roll back/mitigate in the interim.
In the last few years, people have paid me lots of money to build these really cool ELK or Splunk log chewing systems for them, which I have to admit, are utterly useless to me. There are really great monitoring tools which give me historical graphs of stuff I usually don't care about too.. but... I, and most of the folks I run with don't really reach for these tools when we hit a prod issue as the first resort.
Lets say, hypothetically, a customer of mine has an issue where some users are getting timeouts out on some API or another. We got alerted through some monitoring or whatever, and so we start taking a look.
First step, for me is always login a webserver at random (or all of them) and look at the obvious. Logs. Errors. dmesg. IO. Mem. Processes. The pretty graph ELK tools can tell me this info, but what I want to look at next is easier to jump to when I'm already there, than trying to locate IOWAIT in something like splunk.
All looks good on the web servers. Ok. Lets check out the dbs in one term and the apps in another. You follow the request though the infra. Splunk or ELK can tell me one of my apps is eating 25,000 FDs but then what? I need to login anyway. Next on the list are tools like strace/truss, iostat, netstat and so on which will immediately tell you if it's an infra/load/config issue, and we go from there. Down into the rabbit hole.
The point I'm trying to make is; for me at least, the tools we're deploying and being paid well to deploy now like dataloop, newrelic, splunk and so on are actually useless for solving real-time prod issues (for me personally, and my crew, at least) because they only expose a very small amount of info, and almost regardless of the issue I'll need to be on the box looking at something unique to the problem to either explain the impact of it or to mitigate it.
As I said though, I'm a recovering ops person and I'm doing dev these days. I still tend to use print statements when I hit a bug; although since I'm now mostly doing Erlang, bugs are rare and usually trivial to track down.
Interrogate every assumption you make.
Or the equivalent for whatever language I'm using.
- If I do not know what's the problem, I do everything in my power to reproduce the bug and maybe write a test (as small as possible) that triggers the bug. I enable logging or write a special log function to track relevant state in case the bug is rare.
- Once I know what triggers the bug, I should know the general direction of where in the code it is. I start printing various variables and parameters in case it's a low-hanging fruit like wrong sign or stuff like that.
- If I do not succeed with that, I comment out half of the code and look if the bug persists. If it does, then I know it's in the other half. If it does not, then I know it's in this half. I proceed with this binary search until I am down to 1 statement, which takes a logarithmic amount of steps. I found the bug. I fix it. (This does not work if the bug is in two places or if the bug is a bad memory operation that triggers a bug later)
- Real debuggers like valgrind are rarely necessary if you're familiar enough with the program. In fact, unless you're dealing with the hairiest of hairy memory problems and heisenbugs, you probably do not need a debugger at all. Debuggers are useful to debug e.g. generated assembly when you write a compiler.
I have played around with node-inspector, but I have found that it's awfully slow, particularly with really large arrays or objects. So I eventually just abandoned it. It seems like a good idea, and might be worth revisiting in the future.