Hacker News new | past | comments | ask | show | jobs | submit login
The Magic of strace (chadfowler.com)
531 points by chadfowler on Jan 31, 2014 | hide | past | favorite | 99 comments

Small, somewhat nit-picky critique: the man pages for system calls are in section 2. If you want to see the docs for the "read()" syscall, and not the bash builtin "read", saying "man read" w̶o̶n̶'̶t̶ may not (see follow-up) do what you expect. Instead, you should say

  man 2 read
This should probably be mentioned somewhere.

Otherwise, great writeup. Thanks for sharing!


Different Unix systems take different syntax for specifying the section. Long ago I learned to say "man -a read" and just get _all_ the "read" pages, scanning for the ones I want (and sometimes learning about things I didn't know were there).

I'm going to hijack this comment to ask a question that I haven't been able to google:

What is the number after the command called? For example, when I look up 'man sed', I find a manpage for 'sed(1)'[0]. When I look up 'man kill', I find manpages for 'kill(1)' and 'kill(2)'[2].

Can somebody tell me what that number is called so I can look it up? Thanks..

0: https://developer.apple.com/library/mac/documentation/Darwin... 1: https://developer.apple.com/library/mac/documentation/Darwin... 2: http://man7.org/linux/man-pages/man2/kill.2.html

That's the "section" I'm talking about. "sed(1)" means that the man page for that sed is in section 1. "kill(1)" versus "kill(2)" likewise: the former is the command; the latter is the syscall. To distinguish between them, you need to tell man(1) which section to look in for the desired context.

That number refers to the section of the man pages that the command is in. See http://en.wikipedia.org/wiki/Man_page#Manual_sections for an example list; however, it seems the list is not the same between systems.

There's a manpage bash-builtins in section 7, but I've never seen a system that had manpages for the individual builtins, let alone having them in section 1. "man read" on every system I've used opens the read manpage in section 2.

My local CentOS 6.2 VM has bash-builtins in section 1, which is where "man read" takes me. Same with a RHEL 6.4 machine in the colo.

This might be a distro-specific thing.

Same with OSX: "man read" opens the builtin(1) manpage. However on Debian 7.3, it opens read(2).

You can configure the priority of the sections in /etc/man.conf or similar.

Some syscalls share names with commands in section 1 though.

I can't remember a unix system for at least the last decade that hasn't given me the section 1 bash "read" instead of the section 2 system "read" on a "man read".

I've been on Ubuntu for several years, and I see the same as JoshTriplett. Must be platform dependent.

At least on my Mac, it opens the page for BUILTIN(1).

read is also documented by POSIX, and those man pages are often installed in section 1p.

This is why the documentation for COHERENT was changed from "man" pages to what we called a Lexicon, removing the concept of sections. Each entry would have the description of the type of the entry, such as system call, shell command, and so on. Steve Ness has made this available online at http://www.nesssoftware.com/home/mwc/manual.php

You can change the section search order with the SECTION directive in /etc/man_db.conf, or you can override that global setting with the MANSECT environment variable. I just learned this from "man man". :) I'm personally tempted to put "2 3" ahead of the rest, because most of the time I'm just trying to refresh my memory on function parameters.

Don't forget it's userspace equiv (strace is syscalls), ltrace. This tracks all lib calls made by process.

Under windows, strace is an SSL/TLS monitoring tool (also hella useful). It shows payloads passed to CryptoAPI/CNG libs so you can easily troubleshoot explicitly encrypted protocols like ldaps. Especially useful if you use client authenticated TLS where is is not possible to use a TLS mitm proxy to snoop the layer 7 data.

Shameless plug: if you want to trace Windows applications you can take a look at my company products SpyStudio[1] and Deviare[2]. Before downvoting me try them to see how powerful and unique they are in the Windows ecosystem.

VMware is using SpyStudio for creating and troubleshooting application virtualization packages, this is, for example, a twitter post from a VMware escalation engineer: https://twitter.com/DooDleWilk/status/428562701313662977

[1] http://www.nektra.com/products/spystudio-api-monitor/

[2] http://www.nektra.com/products/deviare-api-hook-windows/devi...

Thanks, it actually works quite well!

Agreed, neat stuff. I installed it on my windows workstation.

Good call on ltrace. I thought about writing about that one next.

Please do. I found this article very useful.

You can also track non-lib calls: http://stackoverflow.com/q/311840/309483

Mac OS X has a suite of tools built on a similar package called dtrace—opensnoop and execsnoop. Gives really nice real time lists of all files opened on the system and all binaries executed, respectively.

Thanks for those, very cool and useful! Every time I see blog posts like the above, and comments like yours and others, it reminds me just how much stuff I either don't know or should learn more about.

So what happened with the Lotus system, and how did strace help?

+1! Don't leave us hanging like this.

OP: "I can’t remember the exact problem, but it had something to do with files not being properly accessible in its database"

He can ostensibly remember the exact outcome, though. I read the whole post waiting for resolution, and realized that I'm going to do this in my next post as well. Start with a story, interrupt it, finish it at the end.

Got re-written on top of Eclipse.

Running version 7 on my work laptop. :(

There aren't enough sadfaces in the world for Lotus users.

The client was rewritten on top of Eclipse, not the server.

You are right, I should have been clearer.

if you read all the way to the end, he didn't really use strace. He used truss on Solaris.

Thanks for the writeup! strace should definitely be in your toolbox. There is also systemtap, which I like a lot as well. It has some problem on Linux though, especially only being widely supported of Linux > 3.5 if the distro you are using does not ship with patches. Custom userspace probes are a real strong point.

I wrote a short article about stap using Rubys probes as an example: http://www.asquera.de/blog/2014-01-26/stap-and-ruby-2

To clarify, systemtap needs UTRACE or UPROBES kernel support to trace user processes. Without those you can still inspect kernel functions. IIRC UTRACE was mainly supported by Redhat, and available intheir kernels for quite a few years. In 3.5 the UPROBES functionality was merged in to mainline. Other tools, like perf, can use UPROBES support as well.

Yep, thats the detailed story. A lot of Linuxes (especially current Ubuntu LTS) ship without those patches, though, which makes the whole exercise of compiling the kernel yourself (and maybe a debug image to go along with it) a tedious one, and probably not fit for a production environment.

Thats why I decided to focus on Linux > 3.5, which will be available with Ubuntu 14.04 LTS, where installing gets much easier if you know the right packages. I definitely wanted to make sure that people can start playing around with it in a few minutes.

Also, UPROBES are in my opinion one of the most interesting features to grab with strace. It allows you to easily combine detailed kernel-level tracing with tracing of your application.

(Not that I want to suggest that compiling your own kernel is hard to do, it just takes the fun out of "let's trace!")

You can use Process monitor http://technet.microsoft.com/en-us/sysinternals/bb896645.asp... to see a similar overview of low level activity. You won't see all the system calls, you can't pipe the output directly, but there is a UI and you don't have to look up file descriptors

If you want to see the DLL calls, exceptions, etc, you can use the tools I posted here: https://news.ycombinator.com/item?id=7156160

Yes, which uses ETW, which is the equivalent to DTrace for Windows and something that should be known about (but by and large isn't) for Windows opers

If you like Process Monitor, you'll love xperf.

If you think strace is useful, wait until you try dtrace.

or LTTng with both kernel and userspace tracers http://lttng.org

Disclaimer: LTTng developer :)

I was thinking that too. It is my humble opinion that dtrace will knock strace out of its socks.

I find that strace (or dtruss) is more useful when you know less about what exactly you're trying to find. A log of syscalls is often exactly the right granularity to find out the gist of what a process you didn't write is trying (and failing) to do.

So I think the comparison, despite the superficial similarity and similar mechanism of action, isn't really fair.

dtrace is cool, and it benefits from source instrumentation (probes).

strace / dtruss just spit out kernel syscalls of any user land program wo modifications.

Also dtrace, as seen on smartos

while we wait, could you elaborate on that? ;)

strace shows syscalls -- it's effectively truss.

That's useful and all, but what if you want to instrument arbitrary parts of a program, not just the syscall interface? By function or instruction? Either in userspace or in kernel? With statistical functions? And speculative tracing? And extensive control flow (except loops, which prevent certain safely guarantees DTrace makes). And a lot more.

Don't be fooled by the single-letter change: strace is to DTrace what edlin is to emacs. Or something else ridiculously extreme. They're barely comparable.

Still doesn't tell me what dtrace does.

Or you could just give a concise definition:

    >What is this "DTrace" thing? It stands for "Dynamic Tracing",
    >a way you can attach "probes" to a running system
    >and peek inside as to what it is doing.

It was my mistake to link you to a tutorial on the thing you are asking about. I don't know what came over me.

It's like awk, except that you match entry/exit of syscalls, function calls, method invocations (in ObjC/Java), and give code to execute with access to arguments, return values, stack trace, etc.

It can be used to write tools like strace (see "dtruss" on OSX), iotop, topsyscall, etc.

On Mac OS X, the way to get an idea of what dtrace can do is "apropos dtrace". That shows you the dtrace scripts that the OS ships with.

Thanks! It looks as though by default strace isn't available on Mac but dtrace is!

Has anyone heard of a program that will take strace (or dtrace) output and create a pretty diagram showing which commands call which commands and which files they read or create?

We've got a fairly complicated bioinformatics pipeline that calls about 100 other programs, and creates or reads about 100 different files. I'd love a way to create a picture of what's going on. Which files each program uses, etc.

If such a program doesn't exist, would that be worth building? Could it be something I could potentially sell?

Sounded like an interesting script, so I just wrote it in about a half hour. You're welcome to sell it if you want... (also, there's probably a default recursion limit of 100; unroll the recursion in walkpid to go farther) Collect logs with 'strace -o pids.log -e trace=process -f [specify your process here]', run with 'perl printpids.pl < pids.log'

  #!/usr/bin/perl -w
  use strict;
  my (%pidmap, @order);
  while ( <> ) {
      if ( /^(\d+)\s+(\w+)(.*)$/ ) {
          my ($pid, $syscall, $args) = ($1, $2, $3);
          if ( $syscall =~ /(^clone$|fork$)/ and $args =~ / = (\d+)$/ and $1 > 0 ) {
              my $clonepid = $1;
              $pidmap{$clonepid} = { -parent => $pid };
              push(@order, $clonepid);
          elsif ( $syscall =~ /^exec/ and $args =~ / = (\d+)$/ and $1 == 0 ) {
              my $exec = $args;
              @order = ($pid) if !@order;
              $exec =~ s/^\("([^"]+?)",.*$/$1/g;
              push( @{ $pidmap{$pid}->{-exec} } , $exec );
  foreach my $pid ( @order ) {
      my $spaces = walkpid($pid);
      print "    " x $spaces . join("\n" . ("    " x $spaces), map { $_ . " ($pid)" } @{ $pidmap{$pid}->{-exec} } ) . "\n";
  sub walkpid {
      my $pid = shift;
      my $c = shift || 0;
      if ( exists $pidmap{$pid}->{-parent} ) {
          return walkpid($pidmap{$pid}->{-parent}, $c+1);
      return($pid, $c);

Wow that's amazing! Can I have strace launch the program? Or does it already have to be running?

you can, last strace argument can be a command you want to strace

Valgrind's callgrind tool will profile all calls a program makes. You can them feed the output to kcachegrind (or qcachegrind for the Qt version) which will nicely visualize the profiling run.

"perf top -e syscalls:statfs"

particularly when you don't know which process is calling all the syscalls.

Mix "perf record" and "perf trace" & you have the next generation of strace tools.

So they integrated that upstream, I always wondered what happened to the tool announced here: http://lwn.net/Articles/415728/

  perf --help
    trace           strace inspired tool

I use strace all the time doing ops at Crittercism. Some of the random things it's helped with/taught me:

- allowed exploring forking behavior of daemons, in particular the nitty-gritty of gunicorn's prefork behavior, and understanding the rationale behind single- and double-fork daemons generally (very important to understand for job control e.g. writing upstart/init.d jobs)

- isolated hot reads to memcache in situ, by identifying the socket associated with the memcache connection, and finding which key was read the most by a process (we built better logging after the fact, but sometimes there's no substitute for instrumenting prod during tough perf/stress problems)

- let me explore the behavior of node.js's several threads, and find one of them sending "X" over a socket to the other (still not quite sure what this is, some kind of heartbeat/clock tick?)

- helped understanding "primordial processes" and the exact details of how forking/reparenting work on linux

It's a great tool and one that every ops/infrastructure engineer should be familiar with.

let me explore the behavior of node.js's several threads, and find one of them sending "X" over a socket to the other (still not quite sure what this is, some kind of heartbeat/clock tick?)

I don't know about node.js specifically, but this is a common pattern to wake another thread that uses a select()-style event loop.

It's still no dtrace. Something that I hope enters osx soon.

dtrace has been in osx since 10.5 :/

Quick strace command that I use all the time to see what files a process is opening:

    strace -f <command> 2>&1 | grep ^open
Really useful to see what config files something is reading (and the order) or to see what PHP (or similar) files are being included.

There's normally other ways to do this (eg using a debugger) but sending strace's stderr to stdout and piping through grep is useful in so many cases it's become a command I use every day or 2.

Hey, that's really useful! Cheers :)

For OSX you need to use dtruss, for NetBSD and FreeBSD ktrace is what you need.

ktrace is a little more broad and has much different invocation; truss is most comparable on FreeBSD and SysV.

Strace is easy to use, commonly available, and very useful in many situations.

More modern tools such as dtrace for the solaris and systemtap for linux addresses similar problems but with a broader coverage.

Also check out ltrace... Shows the calls to other libraries the process is making...

I'd also like to point out that a key to using strace successfully is the result column... Programs that fail often make system calls that fail right before they exit... You can often tell what the program is trying and failing to accomplish...

In case you're curious, this is how ltrace (strace's library equivalent) works: http://www.opersys.com/blog/ltrace-internals-140120

I’ve used strace before to help diagnose issues with buggy software I was using and I thought this was a great article.

I just thought I’d let people know that it can be a lot easier to read strace’s output if you read the output log file using Vim as it contains a syntax file which can highlight PIDs, function names, constants, strings, etc. Alternatively, if you don’t want to create an strace log file, you could pipe the output to Vim and it will automatically detect it as being strace output, e.g.

  strace program_name 2>&1 | vim  -

strace taught me that glibc never does what you think it does behind the scenes!

Totally agree that strace is an awesome tool. I've even used it with Java apps that were behaving wierdly, just attach and see what it is saying to the kernel.

Bending to the will of the people, I have appended a conclusion, clarifying the fate of the Lotus Domino server. http://chadfowler.com/blog/2014/01/26/the-magic-of-strace/

Just a few hours ago, a newly minted Ubuntu binary was crashing due a library version mismatch. I thought I had updated the shared libraries to point to the new versions. But definitely something was still hooked to the old version. I just couldn't figure out how/where. ldd wasn't of much help because everything was OK according to it. "If only I can get a bit more info when the binary is running and spit out everything before the crash."

Tried my luck with gdb. Sure enough...there was libQt5DBus pointing to the old libs leading to the crash. If you are feeling particularly adventurous, you can step one instruction at a time after starting. Even without debug symbols, there is quite a lot of info that be used while troubleshooting.

There's a lot of fun to be had with strace. I wrote a tiny perl script that spies on the file descriptors of another process and outputs it to your terminal: https://github.com/psypete/public-bin/blob/public-bin/src/sy...

Even in the 90s Java decompilers existed, so the "We had no source code" excuse sounds a bit strange :-)

Oh we used those too, but in this case there were also native libraries. I was a regular user of jad, even sometimes recompiling and replacing stuff (ooooweeee) in production.

It's instructive to see how much simpler the strace output for a simple program is when the program is statically linked. Especially if you use an alternative libc like musl (http://musl-libc.org/).

Don't forget that sometimes strace is overkill, and similar more easily parsed things can be used instead, for example, /usr/bin/time (vs bash time) has been coming in more and more handy for me.

The first level up on java is being able to tell useful things about it via simply straceing it. Once again another win for dtrace.

Useful article. But the background of the blog flickers rather badly, it's pretty migraine inducing.

my favorite use of strace to learn which files (especially config files) are being open by a new daemon/tool: strace -f -s1024 2>&1|grep open

also remember also useful 'ltrace' - libraries tracing

Unix tools? You mean Linux....

Please see

This is a tool called ptrace - which does everything that strace does and a lot more. You have working binaries in there, and most of the source - I havent extricated the full build dependencies so it all builds, but this includes extra facilities like reporting summaries of process trees, showing only connections or files, and shlib injection into a target process.

If people are interested more on this, contact me at CrispEditor-a.t-gmail.c-o-m

I am a bit paranoid about downloading a binary from a site without a domain name.

I can understand being twitchy but it is so easy to get a domain name that it would literally be no barrier at all to an attacker. Not wanting a binary (especially one that will run with root privs) to come from a domain you don't recognise and trust is understandable. Even then it should be by https with AND/OR a signed package are the first steps to security.

Since IP address ranges are allocated in blocks to ISPs, you can do an IP lookup and discover that this fellow is in the UK using a Virgin Cable connection.

So now we know his ISP too. Useful, eh?

ftp://crisp.dyndns-server.com/pub/release/website/tools/trace-20140126-x86_64-b95.tar.gz works too if it makes you feel any better.

dyndns ain't much better.

Indeed. Dyndns tends to be the source of much Internet clap...

Why do you link directly to the download file? A link to the tool's man page[1] would be sufficient.

[1] http://linux.die.net/man/2/ptrace

I might be wrong, but I'm reasonably sure that you link to the system call's man page. Which is probably what the tool in question uses, but .. not the same thing.

Elsewhere in this discussion: There's a difference between man page section 1 and 2 - and read was quoted as an example for a potential ambiguous result if you invoke "man read" (opens man 1 read here, when man 2 read was the syscall I might want to look at after running strace).

Yes, you are probably right, but the direct link to .tar.gz was not reasonable.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact