
Debugging Software Deployments with Strace - ingve
https://theartofmachinery.com/2019/11/14/deployment_debugging_strace.html
======
birdyrooster
Try out sysdig, it has many more features for domain specific tracing using
what they call “chisels” and it can do stuff like inspect threads and flows
attached to a container or kubernetes resource.

Strace is still a very good tool, but also don’t forget about perf to monitor
kernel threads, print details when a c symbol is seen in execution, and more.

~~~
gnufx
Also perf trace has a lower overheads than strace. I think Brendan Gregg has
numbers somewhere.

------
rachelbythebay
“Failed to open %s: %s”, fn, strerror(errno))

Adapt to your own language. Don’t make your users hate you by having to resort
to strace.

And I love me some strace. I just enjoy NOT needing it even more.

~~~
Shish2k
Amen to that. I wonder how good the signal-noise ratio would be on a lint rule
to specify "error messages may not be static strings, you need at least one
variable" :P

(Maybe not for strictly resource constrained systems - but for something like
python, it just seems lazy to `raise Exception("Failed to connect")` without
saying what you're trying to connect to or what the error is...)

------
eyberg
We turned on strace like functionality (ex:
[https://github.com/nanovms/nanos/issues/844](https://github.com/nanovms/nanos/issues/844)
) for Nanos a long time ago to figure out what to work on for various
applications.

Having said that it's not something I advocate people do to fix production
issues - especially if they didn't write the code they are looking to
debug/fix.

~~~
downerending
Not sure whether I understood your comment, but for me, _strace_ has been an
extremely useful tool for debugging code that I didn't write. Even in the case
where source is available, it's often virtually impenetrable, and many times,
seeing that a program is attempting a series of system calls that is foolish
or guaranteed to fail can quickly provide insight into the problem.

------
pmoriarty
Another very useful tool along these lines is sysdig. It can do pretty much
everything in this article and more.

