
How breakpoints are set - luu
http://majantali.net/2016/10/how-breakpoints-are-set/
======
pcwalton
Also worth noting: x86 supports four hardware breakpoints, which are sometimes
more convenient, since they don't involve overwriting any instructions.
(They're also more flexible because they can be used to trap on memory reads,
etc.)
[https://en.wikipedia.org/wiki/X86_debug_register](https://en.wikipedia.org/wiki/X86_debug_register)

~~~
wazari972
I thought that these HW 'breakpoint' registers were (only) used to implement
watchpoint, that is, read/write operations on data. Can they be used to
implement 'instruction breakpoints'?

~~~
pcwalton
You can set the "break on" bitfield to "execute" (0) to have the debug
register function as a normal breakpoint.

------
dkopi
The reason INT 3 is used is that it's the only interrupt that has a single
byte opcode (0xCC). Other interrupts require two bytes: CD <interupt number>.

This makes setting a breakpoint really easy, as all you have to do is replace
a single byte (and restore a single byte) where you want to place your
breakpoint. INT 3 being only one byte is also important when you're setting a
breakpoint instead of a another single byte instruction - your newly set
breakpoint won't override the consecutive instruction, which might be jumped
to somewhere else in the code.

~~~
colejohnson66
> The reason INT 3 is used is that it's the only interrupt that has a single
> byte opcode (0xCC). Other interrupts require two bytes: CD <interupt
> number>.

It's kind of the other way around. The reason it has a single byte opcode is
because Intel wanted INT3 to be for break points, so they designated 0xCC for
it. In fact, 0xCD 0x03 works, but just isn't used.

~~~
vardump
Because x86 instructions can cross 4/8/16 byte alignment boundaries, you can't
safely set a multi-byte breakpoint in all cases. CPU might execute (bytes
[0xCD, x] -> int x) instruction before the parameter becomes visible and
trigger some other exception, whatever happened to be at that address before.

------
stinos
_You can consider your debugger to be a program which forks() to create a
child process and then calls execl() to load the process we want to debug_

That is one way to look at it, but I find it a bit too limiting (debuggers can
attach to an existing process as well) and too confusing (requires knowing
what fork does/is, same for execl - and are those even used when attaching to
an existing process?) and because of the latter functions used obviously
coming from a linux background (nothing wrong with that, on the contrary, but
I can imagine windows people or beginners still having no clue whatsoever
about a debugger after reading this - though it's likely not the target
group).

~~~
hornetblack
There is also remote debugging. Via network or JTAG.

------
jack9
I went 20 years without knowing how breakpoints work because they always just
did (when they were available). Reading this, it's unsurprising how the tool
works. That's the best kind of tool.

~~~
dkopi
What's even better is that breakpoints haven't really changed over that period
of time. They just work.

While a lot of tech is rapidly moving and constantly changing - this is the
type of fundamental knowledge that will probably prove valuable for the rest
of your career.

~~~
bbcbasic
Is this pretty much how all breakpoints work even in higher level languages
with an IL/Bytecode etc.?

~~~
pcwalton
Usually they either run debugged code in the interpreter or recompile the JIT
code in a special instrumentation mode. (Not having to monkey patch the code
at runtime—being able to do a "proper" recompilation—is one of the advantages
of having a JIT!)

See this explanation of how it works in SpiderMonkey, for instance:
[http://rfrn.org/~shu/2014/11/20/speeding-up-
debugger.html](http://rfrn.org/~shu/2014/11/20/speeding-up-debugger.html)

~~~
pjmlp
Specially when coupled with the possibility to live connect to a production
instance and do all sorts of monitoring and reports.

Of course, one needs to take care of the respective secure access. :)

------
d23
Weird, I just happened to be reading about this topic last night. If you liked
this, make sure to check out the articles by Eli Bendersky in the footer. He
has a 3 part series on how debuggers work:

[http://eli.thegreenplace.net/2011/01/23/how-debuggers-
work-p...](http://eli.thegreenplace.net/2011/01/23/how-debuggers-work-part-1)
[http://eli.thegreenplace.net/2011/01/27/how-debuggers-
work-p...](http://eli.thegreenplace.net/2011/01/27/how-debuggers-work-
part-2-breakpoints) [http://eli.thegreenplace.net/2011/02/07/how-debuggers-
work-p...](http://eli.thegreenplace.net/2011/02/07/how-debuggers-work-
part-3-debugging-information)

------
jtchang
How would a program detect the use of a debugger? I know a lot of crackmes and
other anti-piracy measures involve detecting the use of one but am not sure
how they do it. Do they just look for running processes with a debugger
signature like softice?

~~~
j4_james
One technique I remember being used in DOS apps from many years ago was that
the code would be encrypted in such a way that the next instruction to be
executed would only be decrypted immediately before it was run. This was
achieved by setting up the single step interrupt as the decrypter, and running
the code in single step mode.

The fact that the code was encrypted meant the debugger couldn't disassemble
it in any meaningful way, and also made it impossible to set a breakpoint
(since the breakpoint would just end up being "decrypted" into some other
opcode that would inevitably crash). The debugger also couldn't step through
the code, because taking over the single step interrupt would prevent the
decrypter from running, so you'd just be stepping through garbage.

The way I worked around this was by writing a debugger that could hook the
single step interrupt in such a way that it still forwarded the interrupt onto
the previous hook. I still couldn't set breakpoints, but I could step through
the code, watching it decode itself as it proceeded.

~~~
caf
If you're single-stepping you don't need to patch the instruction stream for a
breakpoint; you can just test the instruction pointer against the break
address at each step.

~~~
Annatar
Not every processor has the means to perform direct comparisons against the
program counter; as far as I know 6502 and MC68000 families can not do direct
comparisons with the program counter.

~~~
caf
It's the program counter of the state that was interrupted by the single-step
trap that's of interest, and _that_ program counter is generally pushed onto
the stack by the trap.

For example, this is how the trace exception on the M68k works - the program
counter of the next instruction to be executed can be read off the stack by
the exception handler. The 6502 doesn't have built-in software single-stepping
but the same effect was sometimes achieved by tying a short timer to the NMI -
and when the NMI is asserted, the interrupted program counter is pushed onto
the stack.

------
gulpahum
Nice explanation.

However, it didn't explain how the debugger can stop again at the breakpoint
after the last step? The interrupt command has been replaced with the original
command, so the process won't stop again..

~~~
dimfeld
Usually the debugger just replaces the breakpoint instruction with the
original byte, then resumes execution in a single-step mode that causes the
CPU to just execute a single instruction before firing another interrupt.
After that, the debugger sets the breakpoint again and resumes normal
execution.

~~~
to3m
When single-stepping, it's necessary to step only one thread, to ensure that
other threads don't skip the (temporarily) disabled breakpoint. There's a
paper here that discusses one solution to the problems this causes:
[http://www.bmrtech.com/uploadfile/image/whitepaper/mentorpap...](http://www.bmrtech.com/uploadfile/image/whitepaper/mentorpaper_multcore_db.pdf)

(You can engineer a deadlock in gdb due to this, e.g., on x64, by stepping
over a SYSCALL instruction that reads from a pipe that's about to be filled by
another thread. But you're unlikely to experience this in practice, as system
calls are wrapped by a glibc function, and you'll probably be stepping over
that rather than the instruction directly.)

------
tavish1
Any comments on how breakpoints work on external targets, running bare-metal,
and you can't replace instructions? Ex. debugging an 8-bit AVR, say atmega1280
over JTAG. I am guessing it has to do with the JTAG doing a simple compare of
the PC with the breakpoint address, just want confirmation and more details.

~~~
TickleSteve
yeah, processors such as the ARM cortex-M series have built in debug features
such as ~8 breakpoint registers that trigger a change in processor state when
they match a code or data access address.

The breakpoint registers are accessed via JTAG/SWD using your
j-link/FET/whatever.

Quite often, when you're debugging embedded systems, you run out of "hardware"
breakpoints and have to resort to software-style breakpoints described in the
article.

~~~
sigill
Yes. And these software breakpoints are written into flash. This is why
they're very slow - and why they wear down your flash.

The situation is kinda OK when you don't often change breakpoints and your CPU
has an instruction register writable by the debug probe via JTAG/SWD/etc. Upon
stepping or continuing from a breakpoint, the debugger will write the actual
instruction at the breakpoint into the instruction register and tell the CPU
"run again, but don't load the instruction from memory as I have already
loaded it into your instruction register.".

Another option is to emulate the effects of the instruction in the debugger
and write the results back into registers/RAM/I/O. This is not always
possible.

If you don't have the options above, your flash will wear down quickly, as
stepping away or continuing from a breakpoint entails writing the actual
instruction back, stepping, then writing the breakpoint again.

~~~
3chelon
Is "wearing down the flash" really still an issue? Hard drives these days are
flash-based - surely the read/write cycles modern flash memory can withstand
are high enough to manage a few breakpoints?

~~~
repiret
Yes. In an embedded environment, its not uncommon to:

1\. Have a NOR flash part rated for 10,000 or fewer cycles. 2\. Have no
facility for remapping bad sectors. 3\. Have no wear-leveling mechanism.

All of this is reasonable for a product that will only be flashed once during
manufacturing (and there are a lot of those products) or a product that will
receive a firmware update a single-digit number of times in its lifetime.

In contrast to an SSD, where:

1\. NAND flash is used, with 100,000 to 1,000,000 write cycles 2\. The drive
can transparently remap bad sectors, so flash can start to fail before anybody
notices. 3\. The drive performs automatic wear leveling - if you try to write
a single sector a million times, the drive will do something closer to writing
a million sectors once.

~~~
vardump
> NAND flash is used, with 100,000 to 1,000,000 write cycles

Damn, where can you get that kind of flash chips with even 100k cycles
endurance? I'd like to place a _large_ order. 64 Gbit chip, please.

1000-3000 P/E cycles is typical endurance for MLC NAND flash, not 100-1000k.
SLC chips would fare better, but larger ones are too expensive for typical
applications.

------
pmalynin
Others have mentioned hardware breakpoints but I also want to mention another
type of breakpoint that can be used for both code and data and does not
require program modification. And I can name at least one debugger that uses
this method. The method is hooking Page Faults. Set the present flag of the
page you want to debug to false and you'll be notified of any code that is
executed there. But you can also be notified when data is accessed and by
which instruction and address.

On Windows this can be done with SEH and Linux has its own thing too.

------
Annatar
1\. contemporary central processing units have large instruction and data
caches.

2\. ptrace() modifies the currently debugged program.

3\. if the program currently being debugged is sufficiently small, it might
end up in the processor's instruction cache.

4\. instruction cache invalidation, at least on some processors like MC68000
family, causes crashes.

5\. since ptrace() effectively performs the equivalent of self-modifying code,
how is instruction cache invalidation avoided?

------
Shivetya
So when stepping through are debuggers just using breakpoints over and over?
how is a step through different than setting a breakpoint if at all?

------
pjmlp
No mention of hardware breakpoints?

~~~
vardump
Indeed, x86 hardware breakpoints are way more powerful and don't require
changing instructions. Only problem there's just 4 of them, but it's usually
enough. It's nice to be able to set a breakpoint on data access as well, it's
often more useful than on instructions.

[https://en.wikipedia.org/wiki/X86_debug_register](https://en.wikipedia.org/wiki/X86_debug_register)

------
apaprocki
Fun tidbit.. AIX ptrace() doesn't support single stepping (PT_STEP). So the
way that single step is implemented is by actually interpreting the
instruction and if it is not a branch, then simply do the breakpoint swap
mentioned in the post. If it _is_ a branch, then decode the instruction (for
all the various types of branch instructions!), actually compute all forms of
the implicit/explicit branch target (register based, offset based, absolute
addresses, etc) and do the instruction swap at all the possible branch
targets. After one of the branches is taken, then put everything back. Oh, and
care must be taken to not disturb atomics...

[https://sourceware.org/git/gitweb.cgi?p=binutils-
gdb.git;a=b...](https://sourceware.org/git/gitweb.cgi?p=binutils-
gdb.git;a=blob;f=gdb/rs6000-aix-
tdep.c;h=acd52bbd9548d6600a7246f8d984410bea72fc64;hb=HEAD#l670)

~~~
umanwizard
Why would you need to put breakpoints in more than one place? Presumably when
you're about to execute the instruction, you know the values of all the
registers and memory addresses so you can just interpret where the jump goes.
What am I missing?

~~~
apaprocki
If memory serves, there are certain branch conditions that can't be easily
queried via ptrace to implement it that way. So instead, you enumerate the
targets and break on all of them.

