
Strace - The Sysadmin's Microscope - epenn
http://blogs.oracle.com/ksplice/entry/strace_the_sysadmin_s_microscope
======
jrappleye
It's also worth looking at SystemTap or DTrace, depending on what OS you're
running. While strace will allow you to look at an individual process and its
children, SystemTap/DTrace will allow you to gather data on system call (and
then some) usage system wide. Some examples:

\- monitor execve() calls system-wide

\- monitor I/O to a specific file, from any process

\- measure per-process network usage

(note that newer Linux kernels may have other ways of accomplishing some of
these tasks that I'm not aware of).

I've had a lot of success using SystemTap to look at low-level filesystem
performance issues in the kernel. We've run SystemTap scripts on our
production filesystem servers for over a year with no problems whatsoever.

Edit: formatting

~~~
JoshTriplett
On a current Linux system, you can monitor several of the items you mention
using "perf".

------
yan
Also very useful: ltrace.

ltrace is like strace, but does library calls instead of system calls.

------
minimax
strace tip #31415927: If your program is I/O bound, sometimes you can improve
performance by increasing the size of the read buffer. A bigger buffer means
fewer system calls and potentially increased performance. How do you know how
big the read buffer is? Sometimes it's hard to tell even if you have the
source (i.e. you're using stdio). With strace you can see the number of bytes
you're trying to slurp in with each read system call. If it looks like a small
number, you can then go figure out how to make it a bigger number, perhaps
using setvbuf or rolling your own buffered I/O.

~~~
dap
Yes, buffer size can have a significant effect on performance. You can quickly
see buffer sizes used by "read" across your system with "dtrace -n
'syscall::read:entry{ @ = quantize(arg2); }'", which summarizes the output (in
case you're doing more of these than you can reasonably see in the console)
and has significantly less impact on the program you're tracing. Output for my
system:

    
    
               value  ------------- Distribution ------------- count    
                   0 |                                         0        
                   1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  1278     
                   2 |                                         0        
                   4 |                                         0        
                   8 |                                         0        
                  16 |                                         0        
                  32 |                                         0        
                  64 |                                         0        
                 128 |                                         0        
                 256 |                                         0        
                 512 |                                         1        
                1024 |                                         0        
                2048 |@                                        20       
                4096 |                                         2        
                8192 |                                         0        
               16384 |                                         0        
               32768 |                                         0        
               65536 |                                         3        
              131072 |                                         2        
              262144 |                                         1        
              524288 |                                         0    
    

and if you want to know where, say, the 512-byte reads are coming from:

# dtrace -n 'syscall::read:entry/arg2 == 512/{ @[ ustack() ] = count(); }'

    
    
      mdworker                                          
                  libSystem.B.dylib`read+0xa
                  Foundation`-[NSConcreteFileHandle readDataOfLength:]+0x1d6
                  RichText`GetMetadataForURL+0x338
                  mdworker`0x100006a66
                  mdworker`0x100009ec1
                  libSystem.B.dylib`_pthread_start+0x14b
                  libSystem.B.dylib`thread_start+0xd
                    1

------
spullara
For those of you on the Mac that doesn't have strace but does have dtrace,
here are some preinstalled dtrace scripts you have at your fingertips:

dtruss - similar to strace opensnoop - all files nettop - all network access
iosnoop/iotop - all io execsnoop - all new processes errinfo - all system
calls resulting in errors

Not exactly the same info but I think much more powerful as it is system wide
and you call always filter out what you don't need to know, or write your own
scripts!

------
dicroce
I use strace all the time (and ltrace too)... I honestly don't know how people
get by without it...

BTW, the Windows equivalent is called "Process Monitor" from the SysInternals
guy...

------
jonmrodriguez
What are the args to make Mac OS's "dtrace" behave like "strace" does when
given no options?

~~~
bodyfour
Look at dtruss.

Unfortunately you need to "sudo" to use it, since dtrace needs root
permsisions.

------
dekz
Helped my trouble shoot why sudo was taking 25+ seconds yesterday. Apparently
it was timing out attempting to perform some NIS operations on a misconfigured
setup.

~~~
Maxious
I've had this issue too, apparently it used to completely fail if the hostname
lookup failed even if your sudoers didn't talk about hostnames at all:
<https://bugs.launchpad.net/ubuntu/+source/sudo/+bug/32906>

------
tocomment
Does anyone know of anything that will parse the output of strace or dtrace.
What I'd like to do is generate a graph of my pipeline showing which program
calls which other program, how long does it take to get back, which files it
uses, etc.

I think it would be a great visualization to go along with my documentation.

~~~
jrappleye
A little Googling turned up this (definitely not the original source, though):

[https://github.com/CyanogenMod/android_external_strace/blob/...](https://github.com/CyanogenMod/android_external_strace/blob/gingerbread/strace-
graph)

    
    
      $ strace  -tt -q -o graph.strace -s 100 -f bash -c 'ls |wc -l'
      $ ./strace-graph graph.strace 
      bash -c ls |wc -l
        +-- ls
        `-- wc -l
    

The comments in the code claim it will show elapsed time for each process, but
that's not working for me.

I discovered the Python ptrace module while I was searching for this. I have a
project for which modifying the Python module might be a nice alternative to
parsing strace output.

------
JonnieCache
OS X has a fancy GUI for this called Instruments.

(Obviously the cmdline tools are there as well.)

~~~
bingbing
Instruments is a frontend to dtrace, I think the closest equivalent to strace
on OS X is dtruss.

Additionally, OS X has several useful stock dtrace scripts, check 'apropos
dtrace' for a listing

------
onedognight
Does anyone know of a usable (no sudo needed) strace for Darwin / OS X?

~~~
brown9-2
This is a tangent but why is sudo needed != usable?

~~~
onedognight
I cannot think of a single time when I'm debugging an errant program when I
would like to have it run as root. I can however think of many things, like
debugging permission issues, resource limit issues, reading and writing files,
etc where it makes a big difference to run as root. I know you can su and then
su back to yourself, but that's a pain and things aren't exactly the same
anymore, i.e. there is a usability problem.

~~~
seanp2k2
Your can send your complaints to Apple and hope that they fix it, or use free
software that doesn't do stupid things like this. IMO, you kind of forfeited
your right to complain when you started using non-free software.

