Hacker News new | past | comments | ask | show | jobs | submit login
Debugging C with Cosmopolitan Libc (ahgamut.github.io)
191 points by ahgamut on Oct 24, 2022 | hide | past | favorite | 50 comments



> Cosmopolitan Libc allows you to log every function call over the program’s execution – just pass --ftrace at the end of your program, like this:

> ./hex16.com ./missing.txt --ftrace

That seems like magic. It seems you have to build with a special flag that tells GCC to put a nop at the start of each function. Then, before main runs, Cosmopolitan sees the --ftrace on the command line, and modifies the code in memory to replace those nops with calls to something that does the logging. See https://justine.lol/ftrace/.


> It seems you have to build with a special flag that tells GCC to put a nop at the start of each function.

BTW - llvm+clang support a similar feature called "X-ray" [1]. It's intended to allow for patching (and unpatching!) these entry/exit points at runtime, with calls to specific tracing / black box facilities or whatever your favorite logging mechanism might be. And it's not limited to x86_64.

[1] https://llvm.org/docs/XRay.html


I am not sure how it's done with cosmopolitan libc, but on MSVC++, when compiled with a hotpatching, every function had a `mov edi, edi` instruction at the start, so that there were a few useless bytes that could be safely removed. Avoid the need for a trampoline.


You don't in principle have to put any nop's at the start of each function for this kind of function "hooking" to work. The Microsoft Detours library does the same thing with no special compilation needed.


From https://www.microsoft.com/en-us/research/project/detours/

"Detours is a library for instrumenting arbitrary Win32 functions Windows-compatible processors. Detours intercepts Win32 functions by re-writing the in-memory code for target functions."

Doesn't that just re-write the DLL import table? ie it only allows interception of a Win32 API calls, not interception of my function A calling my function B?


No, it inserts a jump instruction at the start of the function, and just overrides whatever is there already. To allow you to call the "original" function, it does some analysis to figure out what it's stomping on, and creates another little bootstrap function that executes that one destroyed instruction, then jumps to the second instruction of the original function.

Basically: black magic



Isn't this why Windows have MOV EDI, EDI at the start of a function? https://devblogs.microsoft.com/oldnewthing/20110921-00/?p=95...


No. It sounds like Detours works even when the function hasn't got MOV EDI, EDI at the start.


So much innovation in a tiny libc.


I'm fascinated specifically with how Justine must have collected the knowledge to write Cosmopolitan. I know some guys who play around with PEs and virtual memory manually, but I always wonder what sort of interests lead you to discovering this sort of thing.

I suspect the combination of interests is a little out there. For instance, with game cheat developers, you tend to first at least have some interest in C++, then understanding memory scanning, then signature scanning, trampolining, and all of a sudden you have the skillset ingredients for authoring some rudimentary cheats. Advanced skills come with driver development, which you then pick up to figure out how to evade anti-cheat technology, etc.

But very few developers I know say, oh yeah, I was just interested directly in this sort of thing from the get-go and decided to pick up all of the specific skills to go straight to cheat development.

Usually, it's the guys who already have game development experience.

What in the world did Justine see before Cosmopolitan? Maybe debugging tech? An interest in creating her own libc and understanding syscalls? Just fascinating.


Justine here. I worked on TensorFlow before working on this.


My true introduction to low level system internals was heavily influenced by Saleem (compnerd), who last time I caught up with them, was on Tensorflow at Google.

He's a force of nature, but god damn if I didn't learn a lot from him and his peers. Cheers to you and to him!


I'm not Justine, but I have done some things involving object and program file formats. Before I was a professional dev, in teenage years, I was interested in OS internals and was hacking on a small kernel for fun and learning. Some of my early professional tasks involved kernel mode. I think some people just get bit with a low level bug.


What's the advantage over calling gdb yourself?


You run the executable as usual, and if it crashes, Cosmopolitan Libc's crash handler starts gdb for you with a nice TUI and all the information you need to work with. Probably you could set up a bunch of macros in your editor that do the same thing (and I do have something like that for godbolt compiler explorer), but for gdb at least it's pretty convenient to have it performed by the executable when running it across systems. You can start gdb by yourself too if you have a workflow setup like that :)


+1. I have had my ASSERT defined to 1) signal the program to pause it, 2) start gdb (or ddd), 3) attach to the pid, 4) tell gdb what the binary is, 5) wait on me most of my programming life. No idea why this is not the default.


most programs are not run interactively, so dumping a core is a more sensible default.


You want the program to hang indefinitely in production because it hit an assert?


You want your program to continue executing in an undefined, probably erroneous, possibly destructive state because you ignored an error?


And this isn't theoretical. I've seen ignored errors do damage in the past. Was infuriating to see that we had set a top-level exception handler in our server that just logged and kept going. We had a couple tables in our database trashed because of that and ended up having to restore from the nightly backup, losing everything that was changed that day.


What do you you think the default behavior of an assert is? If you are turning them off, then you get what you describe, which is not what I am advocating.


Assertions are compiled out of 'production' (non-debug) builds.

The same holds true of Python if you run it with the optimize flag/environment variable, leading to any number of hilarious bugs because people cause side-effects in their assert calls.


Compiling out assertions in production is like taking your seat belt off when you leave the driveway. Seriously, in my experience, all the interesting stuff happens in production. We deploy two builds, app.debug and app.release. We run generally run app.debug unless there are performance issues, but even in that case, we sometimes run it just to make sure nothing shows up.


Same here. And it's insignificant cost to have a terminal attached to the process. So instead a mere stack trace and local vars on the stack in the core dump, I actually have the entire process, the heap with all the objects, all file descriptorsabd various OS handles and state, to interrogate. The program would have been running many hours, sometimes days, up to this point. Why would anyone throw all that valuable info away by default.


Of course - crash, hang, core dump, whatever, just do NOT continue as if nothing happened. My programs are in control of much $$$, I would be horrified if a program was to continue running in undefined state. Errors and warnings are issued and handled by operations - that's part of the program logic. But if an assert is triggered then all bets are off, best to stop 99.9% of the time.


I can launch gdb and attach it to the current process in 5 lines, hardly worth pulling a library dependency for.


Cosmopolitan is a replacement standard library. It's not so much "pulling-in" a library as replacing e.g. glibc. The point of the blog post is that it's a libc but also a batteries included C development environment. On the one hand, it's nothing magic, on the other hand, it does have some nice concordances which are worth showing off.


This is exactly the sort of drudgery from which computers were supposed to liberate us.


Depending on any third-party software is a huge liability for any professional software development.

Dependencies must be chosen and managed carefully. I'm not going to depend on something for something that's trivial; the time it takes for me to write it is less than it takes to do a license review and integrate it into my build system.


Imagine if liberating one from drudgery were merely table stakes, and that it would be ridiculous to imagine that you would have to go to any effort to have the debugger pop up when shit fucks up.


Sometimes the binary I am trying to debug is nested within several layers of crap. I have written shims before to attach gdb to the running process so that I don't need to try and wrangle gdb through shell scripts.


If anyone's interested in how this actually works, take a look at this (originally linked) article: https://justine.lol/ape.html

This reminds me a little bit about how "Wine" works, but because the support doesn't involve "everything an OS has to provide" the footprint is smaller.


so where does the machine code for the TUI live? does it get compiled into your binary? how big is it?


The TUI is from gdb.[1] Nothing from GDB is compiled in your binary. My guess (might be wrong!) is that the ShowCrashReports() call added in your program's main function sets up some signal handlers, so that in case of segfaults it starts GDB in a separate process and attaches it to your crashing process.

[1]: https://sourceware.org/gdb/onlinedocs/gdb/TUI.html


Which you have to admit is quite simple but brilliant.


when did they add this?!?


I've read that you can use Cosmopolitan to write C++ as well.

Can you use similar features with C++ development?


Oh yes. A good place to start would be our c++ toolchain. https://github.com/jart/cosmopolitan/blob/master/tool/script...


Awesome, I will definitely be checking this out!


Perhaps I am missing something but what does this have to with Cosmopolitan?


> With Cosmopolitan Libc, you can have gdb integration and detailed backtraces for your C program in just one line: add ShowCrashReports(); at the start of the main function

> Now if you have gdb available on $PATH, you would get a TUI (terminal user interace) showing the register contents and the backtrace of the crash


doesn't the first paragraph clearly establish that?


I mean it says these are part of Cosmopolitan but these just seem like standard tools that I don't need to use Cosmopolitan for…


There are many wonderful tools that help understand what our code is doing (compiler explorer, gdb, valgrind, ftrace, perf record, strace, ASAN/UBSAN). I wrote this blog post because using the debugging tools with Cosmopolitan Libc helped me learn and improve at programming. If you have other resources that can help me understand more about how my code works, I am happy to learn.


It explains how to use them in the way cosmopolitan provides them, which is clearly different from how other environments do it, even though they of course have ways of getting the same kind of information.


There are some things from Cosmopolitan that demonstrate its integration with GDB and other standard tools. For example, the article demos ShowCrashReports() which is nicely integrated with GDB. Also --ftrace, not to be confused with the kernel feature of the same name.


Kind of have the same question as this commenter: https://news.ycombinator.com/item?id=33312835


I wonder if it would be possible to have a kind of logging function which would work similarly as the --ftrace? That would reduce significantly the no-trace cpu cost.


[flagged]


Can we just enjoy the code and not bring politics into it?

> Cosmopolitan Libc makes C a build-once run-anywhere language, like Java, except it doesn't need an interpreter or virtual machine. Instead, it reconfigures stock GCC and Clang to output a POSIX-approved polyglot format that runs natively on Linux + Mac + Windows + FreeBSD + OpenBSD + NetBSD + BIOS with the best possible performance and the tiniest footprint imaginable.

https://justine.lol/cosmopolitan/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: