Hacker News new | past | comments | ask | show | jobs | submit login
Logging C function calls with cosmopolitan Libc (2022) (justine.lol)
25 points by fanf2 40 days ago | hide | past | favorite | 8 comments



Note you can use 'ltrace' with any Linux shared library to trace library calls. https://www.ltrace.org/ https://en.wikipedia.org/wiki/Ltrace


And it can accomplish the same "print timestamps" example from OP using the -tt flag ("-tt If given twice, the time printed will include the microseconds"). It's also the same flag in strace.


> I found no evidence of someone having invented it before.

Most Lisp/Scheme implementations offer tracing. Amazingly, the SICP book even shows tracing as a means of analyzing whether a recursive function is tail-recursive.

What I'm curious about is the Lisp history of tracing. Does anyone know which implementations were the first to provide tracing functionality?


In C land, function entry/exit traces (or reconstruction of one) is a prerequisite for time travel debugging which has been commercially available for over 20 years.

If we are talking prologue/epilogue hooks, those have been commandeered (just like how Cosmopolitan hijacked them) for bespoke entry/exit tracing for who knows how long.

If we are talking generic (non-bespoke) automatic instrumented, whole-program, strippable/injectable entry/exit tracing then that has been commercially available for maybe a decade.

The reason it is not very popular is because naive implementations are high overhead and the trace data is such a deluge of information that you needed specialized visualization tools to make any sense of non-trivial programs.

That is because your average C program is going to do like 10-100 million function calls per second per core in normal operation. I repeat, 100 million logging statements per second, can your normal logging dependencies handle that? Even with just a 64-bit timestamp and nothing else that is nearly 1 GB/s of logging data. You are consuming significant fractions of memory bandwidth for logging. You want optimized high speed logging with efficient binary log formats to support that data rate without incurring immense overhead.

The visualization problem is not as bad these days either. The time travel debugging vendors had to handle even more ridiculous amounts of data so pioneered trace visualization techniques that work on even more, denser data. Everybody else has since copied these techniques to make well known trace visualizers such as perfetto. That is why they work so well across the gamut of tracing implementations (no matter how dense) despite most web development tracers that generate data being so anemic; they lifted from visualizers that actually needed to invent how to effectively visualize the hardest use cases.


Cosmopolitan Libc --ftrace on my workstation logs 1 million lines per second for a program written in C like `python -c "print('hello')" --ftrace`. If I do clang-19 --ftrace which is written in C++ then it does 476 thousand lines per second. That goes half as fast because the Cosmo Libc privileged runtime has to demangle C++ symbols before printing them in the log. Note I'm using `export KPRINTF_LOG=log`. It's hard to believe it goes so fast considering it's a single thread doing this and kisdangerous() has to lock the memory manager and search a red-black tree every time a pointer is dereferenced (e.g. unwinding stack frames to print space indentation) by kprintf() or ftracer(). If the Linux kernel write() syscall didn't have so much overhead it'd probably go faster.


Yeah, if you instead dumped a dense binary log format into the log and then demangled/decoded on display, then you should be able to get much higher throughput.

A separate decode step instead of straight to the console is not so bad because the log throughput is already high enough that you can not read it live, so switching to a post-mortem/dump with a dedicated decode step offers you a lot of options for efficiency.

If you want to go further, since you would no longer emit it directly into the console, you can relax flushing requirements and batch more data before persisting. You could also go for a dedicated log, make it single-producer/single-consumer, or other ways of optimizing the logging itself.

You could probably get 10x on just basic encoded formatting. The next 10x would likely require logging and storage optimizations. It is likely that your OS and storage devices would become your limiting factor at that point. The last 10x to get to the billion events per second per core level is turbo black magic and requires substantive changes to the compiler, operating system, and runtime to achieve.


It is a seemingly outlandish claim.

Even in just the C world, for instance, profiling tools are able to trace all the function calls to determine how many times they are called, how much time is spent, and to recover the call graph information.

The --ftrace option in cosmpolitan shares a name with a Linux kernel function tracing mechanism.

What Justine may mean is that it's done in some new, unusual way; e.g. other than patching the initial instructions of functions to branch somewhere.


The first Lisp implementation in 1960 already had function tracing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: