How Does a Debugger Work?

cplease · on Nov 24, 2014

Nice article, but it doesn't quite deliver. It says, the trick is not "black magic", but then defines debugging in terms of ptrace syscalls, describing the API a little bit, but without giving a clue as to how ptrace actually works. So, ptrace is essentially black magic.

And this is not really an explanation of "how a debugger works," or even "how gdb works." ptrace is just one of several debug targets for gdb. There are simulators, core files, various embedded monitors, VxWorks, Windows, gdb remote debug servers over various interfaces, and on and on. ptrace is irrelevant to other targets.

bcantrill · on Nov 24, 2014

It's a tragedy that ptrace is being portrayed as anything other than an entirely regrettable artifact of history. It's a terrible, terrible interface -- one of the worst in all of Unix, really (and yes, I understand the gravity of that claim). This is reminding me that I really need to write that blog post that excoriates ptrace -- and explain how frustrations with ptrace informed the development of /proc[1][2].

[1] http://dtrace.org/blogs/eschrock/2004/06/25/a-brief-history-...

[2] http://illumos.org/man/4/proc

planckscnst · on Nov 25, 2014

I've been seriously thinking of starting a podcast that layer by layer removes the "magic" behind Linux/Unix and even other OSs, even down to explaining how processors/computers in general work. I'm wondering if people like you have any interest in such a beast.

I have an episode ideas doc [1] for it that is several pages long. I need to flesh out an episode and shoot a test. After that, I need to figure out subject order.

[1] https://docs.google.com/document/d/1Nn01yDS5PkiegHq4Oxz43xUQ...

agumonkey · on Nov 25, 2014

Gonna throw it just in case, I'm very fond of Massalin's thesis http://valerieaurora.org/synthesis/SynthesisOS/

A jitted, partially evaluation capable kernel. The thesis is very readable, proof: I could enjoy almost all of it and I'm a monkey. It might give some insights about kernel design that may not be told in mainstream ones.

ps: mandatory wikiwiki page http://c2.com/cgi/wiki?SynthesisOs

6chars · on Nov 25, 2014

Please do this! I am always so intrigued by systems, but get intimidated by just how wide and deep modern operating systems are. The world needs more technically informative podcasts. I'm a huge fan of the format, but most podcasts I've listened to, even technology ones, have been very fluffy.

ultramancool · on Nov 24, 2014

Indeed, and some of the things in this article are blatantly wrong, such as using an "invalid instruction" to cause a signal we can catch.

Debuggers do NOT use invalid instructions, on x86 and AMD64 they use 0xCC (INT 0x03) on other platforms they use whatever dedicated trap-to-debugger instruction there may be, there's not usually a need to use an invalid instruction to do debugging.

pm90 · on Nov 24, 2014

Hey cplease, could you please write an article explaining all those concepts? I (and Im sure a lot others) would greatly appreciate that!

Thanks :)

wazari972 · on Nov 24, 2014

I'll update the article to explain a bit more why debugging is described in terms of ptrace: all the other targets work the same way! I studied simulators and corefiles, and they both provide to GDB an interface that is very similar to ptrace. Remote debugging just applies serialization to that interface.

alain94040 · on Nov 24, 2014

Not true in the embedded world. Remote gab will use tag commands, which are fascinating. Basically it's a whole different world. Processors have dedicated hardware support for debugging, far beyond traps and exceptions.

aidenn0 · on Nov 24, 2014

It's amazing that so many embedded processors do this now. Back in the day, you needed to use a special-purpose processor (which had extra pinouts) to do this as part of an in-circuit emulator.

wazari972 · on Nov 24, 2014

I never had access to the low-level part of embedded systems, although I did my PhD on embedded systems debugging in an embedded system company! what I saw was what was running on the PC and communicating with the JTAG external interface (basically, a `gdbserver` implementation). At that level, that was nothing more than ptrace. Under that, I can't say!

ChuckMcM · on Nov 24, 2014

One of the more interesting things about the ARM Cortex-M series is that debugging is "built in" to the CPU core on all licensed processors. No hacks required. Something that I'm sure x86 machines would have had, if transistors has been as cheap then as they are now. Of course early on Intel made even more margin on versions of the processor used for doing in circuit emulation by 'bonding out' to an unused pad access to internal trace registers.

xxxyy · on Nov 24, 2014

x86 does also have debugging registers:

http://en.wikipedia.org/wiki/X86_debug_register

They can be used from the Windows Kernel Debugger (WinDbg), although I'm not sure if/how Linux-based gdb uses them.

MaulingMonkey · on Nov 25, 2014

I'm pretty certain that Visual Studio uses those for "memory breakpoints", and that GDB uses those for "hardware watchpoints", to use their respective nomenclature. And these are very handy at times.

msvan · on Nov 24, 2014

I'm not too knowledgeable about this subject, but I've been interested in learning how native code debuggers work for a long time. One thing I wonder is, if the debugger inserts an invalid instruction or a hardware breakpoint instruction into the code at runtime, wouldn't all of the in-memory code need to be reallocated and recalculated in order to make room for the new instruction and recalculate jump addresses? How is this handled?

yan · on Nov 24, 2014

A breakpoint instruction is usually a single byte in variable length instruction ISAs and the same width as any other instruction on fixed width ISAs, so inserting a breakpoint is just overwriting the first byte of an instruction and keeping all other offsets identical.

lostpixel · on Nov 24, 2014

As someone who used to play with debugger implementations a bunch it's nice to see some articles digging into this.

Only feedback I would give is to remove the shadow on your text, I had to manually disable the shadow before I was able to read :).

wazari972 · on Nov 24, 2014

thanks,

regarding your feedback, I don't see any shadow in FF! The CSS is not mine, and there's indeed a `text-shadow` set for the text so I've removed it. Let me know if it's still bad, and maybe which browser you're using

LatencyKills · on Nov 24, 2014

Seconded.

jbn · on Nov 25, 2014

Also relevant, and a good read to boot: http://www.cs.tufts.edu/~nr/pubs/retargetable-abstract.html

agumonkey · on Nov 25, 2014

Just `gdb gdb`. Might please infinite interpretation towers lovers around here.

esfandia · on Nov 24, 2014

Based on the title of the article, I expected it to describe very general principles for writing debuggers, but it seems very specific to gdb. Are things similar in, say, Python or Java?

chrisseaton · on Nov 25, 2014

I wrote a paper about a very different way to debug Ruby http://www.chrisseaton.com/rubytruffle/set_trace_func/ by AST rewriting and dynamic deoptimization.

concernedctzn · on Nov 24, 2014

Greyhat Python: http://www.nostarch.com/ghpython.htm

covers this topic, as well as writing a debugger, and basic fuzzing.

electrum · on Nov 25, 2014

The Java platform has built in support for debuggers: https://docs.oracle.com/javase/jp/8/technotes/guides/jpda/

MichaelGG · on Nov 24, 2014

So how are the ptrace functions implemented? Is the "hack" of inserting invalid instructions used even for single stepping? (Though hardware breakpoints are probably easier?)

barrkel · on Nov 24, 2014

Instructions that trigger a CPU interrupt, or other means of handing control back to the OS, are used for single-stepping at the source level, yes.

I think "hack" is a little too strong a word. CPUs, historically, have had too little state to usefully track all the execution breakpoints you might want to set. So they normally have a nice short instruction that the OS can trap. On x86, it's INT 3, aka interrupt 3, encoded in a single byte, 0xCC. Interrupts are handled by entries in the interrupt vector table (IVT), normally in a region of memory only the OS can write to (if you're not doing something like running DOS in real mode). So when the CPU tries to execute 0xCC, it instead calls code located in the fourth entry (3 counted from 0) in the IVT. This will be filled by the OS, for modern values of OS.

(Single stepping CPU instructions is done with mode flags on the CPU for x86; this let's the CPU single-step things like REPNZ prefix used with string instructions).

spc476 · on Nov 25, 2014

It depends upon the CPU. For instance, the 8086 (on up) includes a bit in the condition code register to enable "single step mode" where it runs a single instruction then traps. It makes single stepping easy.

Other CPUs don't have that. So on a 6502, to do single stepping, you write BRK (software interrupt) over the first byte of the next instruction. When handling the BRK, you write the original instruction back into memory, then insert BRK at the next instruction. The hard part in all this is finding the next instruction. Not only do you have to decode the instructions to find it, but you also have to handle conditional branches, which either involves figuring out which path with be taken and inserting the BRK in the appropriate location, or inserting the BRK into both paths (and having to restore both later). This also precludes single stepping through ROM.

tryp · on Nov 24, 2014

An invalid or debug instruction causes the processor to throw an exception and jump to a specific address where "handler" code should exist. This is managed by the OS, so in this case, the OS would look up the proper thing to do as configured by the earlier ptrace calls -- probably executing your debugger code to let you inspect variables and memory. When you're done inspecting the proper instruction gets re-installed in the target process, the OS restores state saved by the exception trap, and transfers execution back to the debugged process. Most processors have a single-step flag so that the next instruction to execute and the following trigger the exception.

jesuslop · on Nov 24, 2014

Most probably you use uP help. In x86 you activate the trap flag [1] and get a specific interrupt called at each instruction step, there you can do more sofware debugging or instead use hardware breakpoints and other hardware supported debugging as in [2].

[1] https://en.wikipedia.org/wiki/Trap_flag [2] https://en.wikipedia.org/wiki/X86_debug_register

wazari972 · on Nov 25, 2014

I've updated the article with a section on `How is Ptrace implemented?`, I hope it will answer your question!

wazari972 · on Nov 25, 2014

I've updated the article based on you comments, thanks :-) (title more focused, How is Ptrace implemented, What about systems without Ptrace)

omegote · on Nov 24, 2014

By the way, it'd be seriously cool to find a good tutorial about gdb. I've been using it for years, but just the basic operations...

concernedctzn · on Nov 24, 2014

Even just knowing the basic operations is a great starting point. One cool thing I've been doing recently is using the python interface to script it for those times when I'm trying to find something particularly hard to repro that involves a lot of searching. Surprisingly easy to work with.

cbab · on Nov 25, 2014

A great book on the subject is "The Art of Debugging with GDB, DDD, and Eclipse" [1].

[1] - http://www.nostarch.com/debugging.htm

tcas · on Nov 24, 2014

Don't modern processors support hardware breakpoints / watchpoints?

tryp · on Nov 24, 2014

There is typically only one (or a small set) of hardware instruction pointer breakpoints. Every time the CPU is about to execute an instruction, it compares IP to the IP breakpoint register (or set of registers) and if they match, a debug exception is thrown. These are what you must use to debug the BIOS/bootloader executing in-place from ROM with a JTAG or XDP hardware debugger attached. Later, software can add the proper exception handler once enough of the hardware platform is configured. (Not all architectures exit power-on reset with exceptions enabled or even exception vectors mapped to a valid memory location.)

If the code you are debugging is executing out of write-able memory, then your debugger can support an effectively infinite list of breakpoints by writing some instruction that causes an exception to be thrown anywhere you want a break, then handling that exception by looking up the faulting instruction address in your breakpoint list.

Most processors also give you hardware data access breakpoints as well. These typically sit on the CPU's data bus interface and can fire when the address or data that's about to hit the bus match the respective breakpoint registers. There is usually an option to only trap reads, writes, or both. Sometimes you get interesting things like a mask register that lets you trap on a whole block of memory.

One of the most interesting hardware debug features of modern processors is the branch trace which keeps a running list of the last N branch instructions that lets you reconstruct a "how did we get here" story.

MrBuddyCasino · on Nov 24, 2014

Do you know if VMs like the JVM depend on hardware features also? I suspect that would be more an optimization, not a necessity though, since they should be able to de-optimize jitted native code on stack frames containing breakpoints?

bravo22 · on Nov 24, 2014

Most certainly do.

pmalynin · on Nov 24, 2014

Didn't read the article (I find the topic rather bland), but debugging on x86-64 is quite simple: you have your debugging registers (DR0-DR4) which set trigger addresses and conditions (execute, read, write) and then call a system interrupt when the condition is satisfied. This approach is limited to 3 breakpoints. Most moder debuggers do software breakpoints, that is when you set a break point for a particular line or instruction, the debuggrr replaces the first byte of the instruction with an int3 instruction (usually interrupt instructions are two byte wide, so technically int3 and "int 3" are different) but regardless the debugger slusually stores the actual instruction byte in a table to replace the int3 when it is actually hit. I suppose one could do this differently by causing a page fault (a simple bit switch from present to not present in the page table) and then monitoring the CR2 register to get the address of the executing code or the daya that is being accesed. One point I forgot to mention is that the x86 has hardware support for single-stepping instructions (a simple flag). But all of these methods require operating system support.