
Ask HN: Got any tips for profiling and tuning C? - ghotli
I'm a casual C developer. Up until recently it's been kind of a love/hate relationship. I mostly attribute that to being ignorant of the breadth of debugging, tuning, and profiling toolsets that are out there.<p>We use an open source tile rendering engine that's entirely written in C. My goal is to identify it's data access patterns. This will help me determine which functions need to be optimized, or how to reorder the data on disk to to optimize for those data access patterns.<p>I'm going to eventually end up reading the whole codebase, but I'm certain that there are best practices for determining this kind of information that I am just unaware of. I'm vaguely familiar with gdb and valgrind, but I feel like I'm only scratching the surface of their capabilities.<p>What kind of tooling is everyone else using these days? My specific use case is on linux, but I'd appreciate tips across the board.<p>I'm also interested to see if recompiling with llvm and clang would give me any performance increase. I see there are malloc replacements like tcmalloc and hoard. Does anyone have experience with these?
======
tmsh
For profiling: usually as a first pass I turn to PG. :) Seriously though, add
'-pg' to your CFLAGS and LDFLAGS, recompile, run, and look at the output in
gprof. It's a pretty good way to easily identify bottlenecks. You can pipe the
output to dot and graph the call graphs, etc. -- but I've found that less
useful than running the code that I'm trying to make more performant and
studying the first 20-30 lines of the gprof output.

[http://sourceware.org/binutils/docs/gprof/Compiling.html#Com...](http://sourceware.org/binutils/docs/gprof/Compiling.html#Compiling)

There are better alternatives as well. But adding -pg first is just so easy,
and usually (I've found for my stuff) is enough...

For code discovery: I've experimented with strace as others mention (and
ltrace). And there are awesome things like Fenris in theory:

<http://lcamtuf.coredump.cx/fenris/devel.shtml>

But I could never really get them to work personally in practice. Though I
learned a lot about what good integration at the terminal level could look
like by browsing them. At some point, I hacked together vim and gdb
integration pretty well for my purposes (or I should say, improved on the
clewn project. I'm pretty happy with it). I wonder if others have done similar
things. Anyway, I'm curious what others say as well.

------
stonemetal
AMD CodeAnalyst is free and pretty good if you have an AMD CPU. If you have an
Intel processor it still works it just does less.

The two main data access performance tips are:

1) make sure your loops work right to left if you have [10][9][8] iterate over
the 8 array first, the 9 array second, the 10 array third(that is the cache
optimal ordering).

2) Prefer SoA(structure of Arrays) to AoS(array of structures) Say you have an
array of a structure and you need to loop over the array to update one field
in the structure you increase cache hits if you make the structure hold arrays
of elements instead of an array of structures.

------
alextingle
Start with Drepper's "What every programmer should know about memory":

<http://people.redhat.com/drepper/cpumemory.pdf>

------
swolchok
strace might help you, perhaps with the -e trace=file option. Depends on how
mapserver is implemented.

It's not clear to me that this is a C-specific problem.

~~~
ghotli
It's not really, all things need to be profiled in high load situations. I'm
moreso looking for an overview of the tooling to see if I'm just unaware of a
vital tool. Thanks for the strace info, I'll check it out.

