
Making a low level Linux debugger, part 2: C - asrp
https://blog.asrpo.com/making_a_low_level_debugger_part_2
======
saagarjha
> By convention, the stack is between the value of registers rbp (lower
> address) and rsp (higher address) and rsp increases when there are more
> stack frames added. We're on 64-bit Linux so each frame takes up 8 bytes.

Stack frames grow in size based on local variables allocated on the stack. And
$rbp and $rsp need not point at the correct places in the stack for leaf
functions, at least on Linux, because it uses the System V ABI.

~~~
asrp
Thank you. I'll add an edit to the post later (here's hoping this[1] is
accurate enough). I do not know the conventions well and this debugger/editor
in parts help me see what's going on.

In general, is there some "don't do anything funky" compiler flag so it sticks
to a simpler internal model?

[1]
[https://wiki.osdev.org/Calling_Conventions](https://wiki.osdev.org/Calling_Conventions)

~~~
jcranmer
> In general, is there some "don't do anything funky" compiler flag so it
> sticks to a simpler internal model?

Most of my experience is with LLVM/Clang, so I can't say too much about how
gcc differs in its thought model.

Modern compilers use SSA as the basis for common optimizations. In SSA form,
every variable has exactly one definition (different writes originating on
different control flow paths is represented with special phi constructs).
Conversion to SSA form pretty much irrevocably destroys the original notion of
variables. The standard big optimization passes will further destroy any easy
mapping to the source code: code is pushed out of loops if possible,
unexecutable control flow paths are removed, redundant computations (including
both within statements and across the entire function) are eliminated, etc.
This becomes particularly tricky in the backend, where register allocation
means that some variables just don't exist in state anymore (because you
needed that space for something else, and it's dead, so why keep it
somewhere?).

What this means is that, when optimizing code, the maintenance of debugging
information is very much a best-effort. If you disable optimization, you get
something that is relatively akin to a very literal translation of C code to
assembly. Even very basic optimizations, however, will almost immediately
destroy the basic guarantees. -O1 (or -Og for gcc) will generally avoid the
optimizations that do the truly insane manipulations, but you're still liable
to get this issue.

The basic representation for debugging information on Linux and OS X is DWARF.
DWARF is a nasty specification to read, and it doesn't insulate you from
having to learn all of the C or C++ ABI implications. There is a facility to
use DWARF to indicate variables that aren't located in the stack, but it
doesn't look like compilers maintain debugging information well enough if
variables are promoted to registers instead.

~~~
asrp
Thanks. The choice of gcc was somewhat arbitrary so clang could work too. I
actually fiddle a bit with lldb before this.

From your description, it sounds like I'd really ought to removing
optimizations (with -O0 from what's suggested here).

For variables, local variables can be optimized out (something I recall seeing
in gdb without -O0) but all global variables are still kept, right? (At least,
the ELF has names and addresses.)

~~~
jcranmer
Compilers are free to delete global variables, if nothing references them,
just like local variables. That said, if you don't declare them static or some
other form of private variable, then compilers generally need to assume that
some unknown entity can refer to them, which generally prevents their removal.

------
aleden
Where's part one? Author states "Last time", but that hyperlink just goes to
the same page it's on.

[https://blog.asrpo.com/making_a_low_level_debugger_part_1](https://blog.asrpo.com/making_a_low_level_debugger_part_1)
gives 404

~~~
christophergray
[https://blog.asrpo.com/making_a_low_level_debugger](https://blog.asrpo.com/making_a_low_level_debugger)

