And this is not really an explanation of "how a debugger works," or even "how gdb works." ptrace is just one of several debug targets for gdb. There are simulators, core files, various embedded monitors, VxWorks, Windows, gdb remote debug servers over various interfaces, and on and on. ptrace is irrelevant to other targets.
I have an episode ideas doc  for it that is several pages long. I need to flesh out an episode and shoot a test. After that, I need to figure out subject order.
A jitted, partially evaluation capable kernel. The thesis is very readable, proof: I could enjoy almost all of it and I'm a monkey. It might give some insights about kernel design that may not be told in mainstream ones.
ps: mandatory wikiwiki page http://c2.com/cgi/wiki?SynthesisOs
Debuggers do NOT use invalid instructions, on x86 and AMD64 they use 0xCC (INT 0x03) on other platforms they use whatever dedicated trap-to-debugger instruction there may be, there's not usually a need to use an invalid instruction to do debugging.
They can be used from the Windows Kernel Debugger (WinDbg), although I'm not sure if/how Linux-based gdb uses them.
Only feedback I would give is to remove the shadow on your text, I had to manually disable the shadow before I was able to read :).
regarding your feedback, I don't see any shadow in FF! The CSS is not mine, and there's indeed a `text-shadow` set for the text so I've removed it. Let me know if it's still bad, and maybe which browser you're using
covers this topic, as well as writing a debugger, and basic fuzzing.
I think "hack" is a little too strong a word. CPUs, historically, have had too little state to usefully track all the execution breakpoints you might want to set. So they normally have a nice short instruction that the OS can trap. On x86, it's INT 3, aka interrupt 3, encoded in a single byte, 0xCC. Interrupts are handled by entries in the interrupt vector table (IVT), normally in a region of memory only the OS can write to (if you're not doing something like running DOS in real mode). So when the CPU tries to execute 0xCC, it instead calls code located in the fourth entry (3 counted from 0) in the IVT. This will be filled by the OS, for modern values of OS.
(Single stepping CPU instructions is done with mode flags on the CPU for x86; this let's the CPU single-step things like REPNZ prefix used with string instructions).
Other CPUs don't have that. So on a 6502, to do single stepping, you write BRK (software interrupt) over the first byte of the next instruction. When handling the BRK, you write the original instruction back into memory, then insert BRK at the next instruction. The hard part in all this is finding the next instruction. Not only do you have to decode the instructions to find it, but you also have to handle conditional branches, which either involves figuring out which path with be taken and inserting the BRK in the appropriate location, or inserting the BRK into both paths (and having to restore both later). This also precludes single stepping through ROM.
 - http://www.nostarch.com/debugging.htm
If the code you are debugging is executing out of write-able memory, then your debugger can support an effectively infinite list of breakpoints by writing some instruction that causes an exception to be thrown anywhere you want a break, then handling that exception by looking up the faulting instruction address in your breakpoint list.
Most processors also give you hardware data access breakpoints as well. These typically sit on the CPU's data bus interface and can fire when the address or data that's about to hit the bus match the respective breakpoint registers. There is usually an option to only trap reads, writes, or both. Sometimes you get interesting things like a mask register that lets you trap on a whole block of memory.
One of the most interesting hardware debug features of modern processors is the branch trace which keeps a running list of the last N branch instructions that lets you reconstruct a "how did we get here" story.