
Strace – My Favourite Secret Weapon (2011) - zwischenzug
https://zwischenzugs.com/2011/08/29/my-favourite-secret-weapon-strace/
======
orivej
Re. _My Library Is Not Loading!_ , a pretty useful trick to find what files
are missing is to extend your path env var with an empty directory and then
grep strace output for its occurrences, e.g.

    
    
      mkdir /tmp/empty
      LD_LIBRARY_PATH+=:/tmp/empty  # or PATH, or PERL5LIB, etc.
      strace -f program |& grep /tmp/empty
    

I had a need to track what files are used by what processes spawned by a
program (to infer dependencies between the processes), and it seemed (and
probably was) simpler to use ptrace directly rather than to parse strace
output, so I wrote
[https://github.com/orivej/fptrace/](https://github.com/orivej/fptrace/) .
Soon it turned out useful to dump its data into shell scripts that reflect the
tree of spawned processes and let you rerun an arbitrary subtree. I mostly use
it to debug build systems, for example to trace _configure_ and examine what
env var affected a certain check in a strange way, or to trace _make_ and
rerun the C compiler with _-dD -E_ to inspect an unexpected interference
between includes.

~~~
ereyes01
This is a cool trick. On Linux, the ldd command also tells you what libraries
a program wants to link to. You can also play around with LD_LIBRARY_PATH to
manipulate what ldd tells you.

~~~
hugelgupf
Binaries can also use dlopen to open libraries that aren't listed by ldd.

~~~
ereyes01
Good point, dlopen is fully run-time, and the strace approach will catch those
too, so it's more comprehensive.

------
brendangregg
No mention of overhead. strace can bring down your production environment --
it is not safe. This is why I wrote this:
[http://www.brendangregg.com/blog/2014-05-11/strace-wow-
much-...](http://www.brendangregg.com/blog/2014-05-11/strace-wow-much-
syscall.html)

"perf trace" has improved since I wrote that, and may well be mostly strace-
equivalent (but without crippling overhead) in the latest Linux.

~~~
zwischenzug
Whenever I've had to use it in prod (in heavy OLTP environments) the
seriousness of the issue has always outweighed any performance concerns. Ditto
tcpdump. Often it was used specifically to determine the cause of performance
issues. In any case you generally only strace 1 process, and if your
application stack depends on one process you're probably in other kinds of
trouble... unless it's erlang :)

~~~
brendangregg
It's not a choice between strace or nothing. It's a choice between strace,
ftrace, perf, or eBPF -- and that's just the Linux builtins. Many low overhead
addons can also do syscall tracing (sysdig, LTTng).

I often run ftrace, perf, and eBPF on our production instances for syscall
tracing. If I ran strace, the instance would suddenly be very slow, and it
would trigger Hysterix (and other) timeouts and be removed from the ASG and
auto terminated. Our environment is fault-tolerant, so yes, we can run strace
-- you just don't get much output, and the load vanishes from the instance you
are looking at.

~~~
Bromskloss
Which of those alternatives are an option if the goal is to inspect and
rewrite syscall arguments and return values, and do other things in between?

------
zwischenzug
Nice to see something I wrote 7 years ago hasn't gone entirely to waste. Spot
the 'ancient' technologies like Solaris and perl mentioned there...

~~~
pjc50
> I’m often asked in my technical troubleshooting job to solve problems that
> development teams can’t solve

This sounds like a really interesting job - consultancy? How did you get
started in it

Strace was always my go-to on Linux for solving error messages that fail to be
a complete sentence: "connection refused" (to what?) "file not found" (where
did you look?) and so on. Since I've moved to Windows the best alternative
seems to be Process Explorer.

~~~
techsupporter
> This sounds like a really interesting job - consultancy?

I'm not OP but I do the same thing, solve the problems and bugs that the dev
team can't. In my world, this is simply "Operations" or, in later years,
"Systems Engineer." I'm really good at doing that but I am not at all skilled
at large-scale software development even though I can read and understand
already-written code as part of my troubleshooting.

Descriptions like the author's are why I'm dismayed that Ops has such a bad
reputation in the modern computing industry and why I'm not so keen on the
combination of the two roles into "DevOps." Troubleshooting a working, in-
motion system requires a different set of skills, in my experience, from
writing and debugging code. It's also a set of skills that doesn't seem to
overlap very often. The two roles go hand-in-hand, of course, but asking
someone to do both in a large environment hasn't ended will (again, in my
experience).

~~~
devonkim
The anti-pattern / co-option of devops as a role is almost entirely
overlapping with companies undergoing digital transformations and advised by
some CIO magazine, Big Four, or Gartner team that it’s important for said
transformation. Because the cultural transformation in itself is already being
branded as something new and idiosyncratic to the company the term “devops” as
the cultural change itself is dropped to avoid confusion.

In most large non-tech companies operations is a cost center budgeted
alongside facilities and basic costs of business and also usually run by
managers less frequently with a software background as much as an IT or
traditional business background focused around cost efficiencies. I’m not sure
whether this designation or Taylorism is fundamentally the root causes for why
“devops” from the cultural sense is not really happening.

------
lucb1e
Strace and Wireshark. If a program's output doesn't immediately tell me what
the hell its issue is, I'll just dive into this depending on the kind of
trouble.

The only thing I'd still like to have is something like strace but for general
function calls. Often, the issue is within the application and does not show
in syscalls. I guess this should be possible with gdb, but I haven't looked
into it yet, also because any meaningful names are often stripped from the
binaries.

~~~
schoen
You might enjoy using ltrace! It does pretty much what you're looking for.
(Although it only shows calls into dynamic libraries and not other internal
function calls, as far as I know. Maybe that's different if you have debugging
symbols in the binary?)

~~~
lucb1e

        $ whatis ltrace
        ltrace (1)           - A library call tracer
    

That seems nice indeed. It seems to make things a lot slower, but I guess
that's to be expected from tracing at this level!

------
vram22
As I said in a comment on the OP, lsof (LiSt Open Files - it does more than
its name suggests) is pretty useful too.

fuser is useful too, and available on many Unixen. I've used "fuser -k" many
times to find and kill rogue processes.

[https://en.wikipedia.org/wiki/Lsof](https://en.wikipedia.org/wiki/Lsof)

[https://en.wikipedia.org/wiki/Fuser_(Unix)](https://en.wikipedia.org/wiki/Fuser_\(Unix\))

Edited for grammar.

~~~
twic
One of my favourite hacks is this script (which might well not work - it's
flaky):

[https://bitbucket.org/twic/devtools/src/1b7a8f9ab849b36de70c...](https://bitbucket.org/twic/devtools/src/1b7a8f9ab849b36de70c6fe1cab2460fb2657c6d/bin/killfd?at=default&fileviewer=file-
view-default)

Which, given a specification of a filehandle (eg an IP address or path) uses
lsof to determine the PID of the owning process and the numeric value of the
filehandle, and then uses gdb to attach to that process and close it. It's a
crude but simple way of exercising error-handling code.

~~~
vram22
Interesting one.

Are you running Elixir in the last line? the iex?

~~~
twic
That's this line, i think:

    
    
        sudo gdb -batch -n -iex "set auto-load off" -p $TARGET_PID -ex "call close(${TARGET_FD})"
    

The -iex is a flag which executes a command on startup:

    
    
        -init-eval-command command
        -iex command
           Execute a single GDB command before loading the inferior (but after loading gdbinit files). See Startup.
    

[1] [https://sourceware.org/gdb/current/onlinedocs/gdb/File-
Optio...](https://sourceware.org/gdb/current/onlinedocs/gdb/File-
Options.html#File-Options)

~~~
vram22
Got it now, thanks.

------
amelius
The secret weapon has a (serious) flaw on Linux though: you can't run strace
on programs that use strace. That's because the Linux ptrace call is not re-
entrant.

~~~
orivej
Actually you can strace strace! Try

    
    
      strace -o 1.log strace -o 2.log ls
    

and observe in 1.log how the second strace uses ptrace call.

The limitation is that a process can not be ptraced by multiple processes at
the same time. If you add _-f_ to the first strace, it will start tracing the
fork meant for _ls_ before the second strace has a chance to setup its
tracing. That setup will fail, and the second strace will kill the fork
instead of running _ls_. You can read this from _1.log_!

If I'm not mistaken, a ptrace tool may untrace its grandchild right when it
intercepts a child attempt to trace that grandchild to make the attempt
succeed, but I don't know if strace can.

------
shinnok
A good complement to strace is ltrace(1):

    
    
      ltrace is a program that simply runs the specified command until it exits. 
      It intercepts and records the dynamic library calls which are called by the executed process and the signals which are received by that process. 
      It can also intercept and print the system calls executed by the program.
    

Also noteworthy is that you can simply press _s_ key in _htop_ to immediately
attach a process and inspect it, which was handy many times for me.

------
thedatamonger
I was in awe when I first used strace. The world of tracing is vast these days
and filled with wonders. I can't possibly say it better than this this guy so
I'll just include a link.

[http://www.brendangregg.com/blog/2016-03-05/linux-bpf-
superp...](http://www.brendangregg.com/blog/2016-03-05/linux-bpf-
superpowers.html)

------
lamby
Here's my strace hack from 2008: [https://chris-lamb.co.uk/posts/can-you-get-
cp-to-give-a-prog...](https://chris-lamb.co.uk/posts/can-you-get-cp-to-give-a-
progress-bar-like-wget)

------
snissn
Is there a substitute for mac/osx?

~~~
zbentley
Dtruss: [https://opensourcehacker.com/2011/12/02/osx-strace-
equivalen...](https://opensourcehacker.com/2011/12/02/osx-strace-equivalent-
dtruss-seeing-inside-applications-what-they-do-and-why-they-hang/)

Also, on OSX you have access to (mostly) the full power of dtrace, so you can
do diagnostics on closed-source, debug-stripped running programs that make
strace look small by comparison.

~~~
slrz
Just tried it on iTunes and it doesn't seem to work at all.

I already allowed myself to debug my own system by disabling "System
Integrity" in recovery mode.

~~~
tim--
A quick Google for `PT_DENY_ATTACH` and dtrace should solve that

------
adultSwim
This is my favorite strace resource: [https://jvns.ca/strace-
zine-v3.pdf](https://jvns.ca/strace-zine-v3.pdf)

------
ggm
strace is great. There are times this feels like the modern equivalent of
printf() debugging, because working out why a syscall is failing involves a
fair amount of back tracing to context sometimes. But, as a foot in the door?
its ace.

------
partycoder
If you want to listen on a port under 1024 you require superuser access or
special privileges.

[https://www.w3.org/Daemon/User/Installation/PrivilegedPorts....](https://www.w3.org/Daemon/User/Installation/PrivilegedPorts.html)

------
posharma
Is there a strace equivalent for windows?

