When developing PyTorch, we also run into a lot of mixed Python/C++ language situations. We've recently been experimenting with in-process 'combined' Python/C++/PyTorch 2.0 stack traces to make it easier to understand where code is executing (https://dev-discuss.pytorch.org/t/fast-combined-c-python-tor...).
GDB can decode Python stack traces as Python code, as long as you have have python-gdb.py from the Python source distribution in the same place as your Python executable.
GDB does so much cool. It just sucks that it's missing a high quality front-end. I know there are a bunch out there, but they all seem pretty janky and get confused a lot; I just want a really serious group of people (some Red Hat people for example) to make a high quality native debugging app which properly speaks the GDB protocol and handles all the edge cases.
What I would love is also to be able to set python breakpoints from gdb. And integration with rr, so I could reverse-continue to said breakpoints. And ponies.
Wow, that's pretty amazing. I wonder how it's implemented, and if there are any tutorials on implementing something similar, for programming language designers/creators.
>> There's Voltron, which is an extensible Python debugger UI that supports LLDB, GDB, VDB, and WinDbg/CDB (via PyKD) and runs on macOS, Linux and Windows. For the first three it supports x86, x86_64, and arm with even arm64 support for lldb while adding even powerpc support for gdb.
https://github.com/snare/voltron
That reminds me of the time I debugged some code (also neural network) that was in both Java and C++ and I was able to attach both gdb and jdb to the same process but had to disable the segfault trap on gdb because the jvm segfaults all the time in normal operation.
As a C++ developer, I'm struggling to understand how and why segfaults would ever be part of normal operation. In my mental model, the presence of a segfault means that a program has gone so far off the expected path that no guarantees can be made about its state whatsoever, so the only safe thing to do is to let the program crash. Is there a reason why the JVM regularly segfaults?
Segfaults are unexpected and shouldn't happen in a conforming C++ program that doesn't evaluate operations with undefined behavior.
But in non-portable C or C++ program doing manual memory management with OS primitives targeting a specific OS the conditions for segfault are well documented, and you can also rely on the programs behavior when that happens (the OS raises a signal for you, that you can handle).
There are not many programs that should be written this way, but I assume the JVM might fall into this category. I'm still not sure if handling page faults this way in regular operation is the best strategy, but I would worry about performance more than correctness.
An inspiring, albeit daunting, write-up for someone like me, who has been a python coder for over 10 years, but who has never professionally coded in C or C++. I'd be pleasantly surprised with myself if I could one day debug like that.
Also somewhat depressing that, yet again, the GIL was to blame, and that after all that impressive investigatory work, the fix was (spoiler alert!) "rewrite the offending function in C".
I predominantly develop in Python but somewhat frequently peer into the C++ and CUDA underlying Pytorch. I am not a competent C or C++ programmer but often am able to debug and fix bugs that originate in the lower levels of the stack. That ability developed just through exposure and focused prodding when I found issues, but I don't think it has much relation or translates to being a professional coder in C/C++/CUDA. Coding and debugging are different skill sets IMO.
It's unfortunate that no one thinks about the debugging experience with anything they build today. It's always a "strap yourself in" experience like in the post.
I dream of an IDE that is one-click, full-stack, local debug.