
Debug Information Is Huge and What to Do About It - sbahra
https://documentation.backtrace.io/dwarf/
======
dgwynne
Solaris developed another solution, specifically CTF (Compressed Type Format).
CTF stores data types and function signatures rather than full debug info, and
is therefore much smaller than the DWARF information it is derived from.

The entire Solaris system is built with CTF enabled, which is used to support
their debuggers and dtrace. Other systems have adopted it too. OpenBSD is
moving to use CTF, and has enabled it's use in the kernels and debuggers on
some architecture.

To get a sense of the size difference, DWARF information for a sparc64 kernel
is about 27 megabytes. The CTF information derived from it is 473 kilobytes.

~~~
valleyer
It sounds like CTF (which I'm admittedly only peripherally aware of through
exposure to dtrace) is not really a viable replacement for most DWARF use
cases then, right? Why is the size comparison valid?

~~~
wkz
Function boundaries, arguments and data types will get you a long way in my
experience. Sure there are times when you need to look at local variables in
the middle of a function etc, but half of those times DWARF won't have the
register information anyway. Remember the use-case here is production, your
code will be optimized.

So you'll still end up disassembling the thing to figure out which register to
look at. And you might as well do that on the unstripped binary on your
development box. For embedded targets at least, that's a small price to pay
for a size reduction of that magnitude.

------
awalton
Use build-id
([https://fedoraproject.org/wiki/Releases/FeatureBuildId](https://fedoraproject.org/wiki/Releases/FeatureBuildId)),
strip the debug info into its own object file, and point gdb at your debug
objects when it comes time to debug.

Lean distributable binaries, yet still debugable. Win-win.

~~~
psykotic
And ideally you just put all the debug info files on a server where they can
be indexed by the build ID, and the debugger automatically tries to fetch the
relevant debug info files when debugging a process or core dump. This is easy
on Windows since dbghelp.dll (on top of which most debuggers are built) lets
you put symbol server URLs in the symbol path list and it will automatically
look up GUIDs there and cache downloaded PDBs locally for reuse. As far as I
know, there still isn't a turn-key equivalent with GDB, but you could probably
rig something up with Python scripting?

The next step up in convenience is having the source-level debugger understand
how to fetch the corresponding source files from a server based on revision
IDs or content hashes.

Between those two features, you can open a core dump on any machine, and
immediately have access to both debug info and source without manually mucking
around with anything. If you spend a lot of time looking at core dumps from a
myriad of different programs and versions of those programs, this is a
lifesaver.

------
bananaboy
It's frustrating when developers distribute binaries without debug information
under the mistaken assumption that it's going to impact on performance. At a
previous company I worked at we were using Scaleform (UI solution for games)
and they refused to ship debug information with their release builds. I
reported it as a bug and sent them links and information about how it wouldn't
affect performance, but they still refused. In the end I just built it myself.

~~~
taneq
Even if they encouraged you to ship your finished product with debug info
stripped, surely they could have sent you a set of release builds with debug
info for debugging purposes? Having faced those "works perfectly in debug,
crashes mysteriously in release" situations myself in the past, they're bad
enough to track down even with debug info...

------
loeg
Minor typo:

> A more efficient representation of .debug_rnglists introduced in DWARF 5.

Should probably read:

> A more efficient representation of .debug_ranges introduced in DWARF 5.

Edit: Tangentially, I loved [https://backtrace.io/blog/compile-once-debug-
twice-picking-a...](https://backtrace.io/blog/compile-once-debug-twice-
picking-a-compiler-for-debuggability-1of3/) and am still looking forward to
parts 2 and 3. Thanks!

~~~
sbahra
Whoops, good catch! I will fix, thanks.

------
jquast
Please Arch Linux developers: stop stripping all packages of debug symbols

~~~
d33
Isn't it a common practice to strip, but keep them in separately published
files that only contain symbols?

~~~
codys
It is, in linux distros at large. Arch, however, does not publish (or even
generate) those debug files (which contain the debug info & symbols that were
stripped out).

------
mabynogy
The main pain point for me in that domain currently is the wrong results that
backtrace() gives when optimizations are enabled.

I noticed that clang seems to be better at generating correct infos in such
modes.

Otherwise I use GDB in "batch mode" to get a callstack triggered by
raise(SIGTRAP):

> gdb -quiet --batch -ex run -ex backtrace --args $binary $@

~~~
loeg
backtrace(3) doesn't use or understand DWARF debug information at all — it's
purely a machine stack (doesn't understand tail calls or inlined functions)
and can only look up ELF symbols.

It's better than nothing, but consider using something like libunwind instead.

~~~
mabynogy
Many thanks. I'll check it.

The ultimate goal is to get a backtrace without debug symbol (in release) but
I don't know any method to achieve that without instrumentation.

~~~
loeg
You need some debug information to get accurate backtraces. But you can
discard the rest of the debug information that you don't need. The article
actually mentions this a little bit near the end, starting around:

> For example, if you would only like accurate unwinding then you can retain
> only .debug_frame and .debug_line.

