Many thanks for building and releasing this. It's ridiculously easy to install (especially in virtualenvs) and very powerful. When I push '3' or '4', I get informative, stable output.
Minor feature request: an explicit 'pause' button would make it easier to copy file paths from the output. Ctrl-S is a reasonable alternative, but it's a little hacky.
Also, it would be nice to somehow eliminate time spent in poll() from the results. I'm profiling a server process and 99.9% of the time is spent in a poll function. Perhaps there could be an option to disregard time spent in system calls rather than user code. Most of the time I'm interested in profiling only user code.
(Actually, I've been looking for a reason to play with Rust code. Maybe I'll try to add these features myself!)
Thanks! Both of your suggestions totally make sense. I've created an issue to track the poll() issue here https://github.com/benfred/py-spy/issues/13 - I think that should be an easy fix.
This is really fantastic. I just managed to find a 3x speedup on a compute heavy job I run using Py-spy - there was an unneeded hotspot in a library I'm using that I didn't previously suspect. It would have taken a long time using kernprof to dig in through the call stack to find the issue.
This is a really great tool - the kind I didn't even know that I needed until I was given it!
One question, does anyone here know how to interpret the GIL (Global Interpreter Lock) percentage display in the top-left? In my code, "Active" sticks nicely at 100%, but the GIL jumps around from 1% to 100%, changing on every sample.
edit: Now that I think about it, my code spends a lot of time in C API calls - maybe the GIL is released there?
I've been enjoying pyflame from Uber - which the author quotes in their info
> The only other Python profiler that runs totally in a separate process is pyflame, which profiles remote python processes by using the ptrace system call. While pyflame is a great project, it doesn't support Python 3.7 yet and doesn't work on OSX or Windows.')
> Py-spy works by directly reading the memory of the python program using the process_vm_readv system call on Linux, the vm_read call on OSX or the ReadProcessMemory call on Windows.
I think ptrace is fundamentally letting you do the same thing in terms of how pyflame is using it.. and the same ptrace access permission governs whether you can use process_vm_readv
The real win for this project is a real-time "top" or "perf top" style UI instead of only generating flamegraph output. I love that feature, and will be particularly good for quick shot "what is this process doing" type info as opposed to specifically profiling some timeframe to analyse the resulting flamegraph (which is all pyflame let you do)
If you're talking about the flame graphs, they're a fairly common feature of modern profilers. The oldest implementation I know of is at https://github.com/brendangregg/FlameGraph.
I've never understood why flame graphs are better than the normal presentation of inclusive and exclusive timings in performance tools, even if they're not "modern", but embody some decades' experience. Anyone care to explain?
Of course I expect to see the call tree, and to flip between that and a flat view, to flip between different metrics in the views, and to see them across processes/threads. I'm talking about graphical tools (perhaps coupled with external reduction of the data) like CUBE [1], Paraprof [2], and those in toolsets like Open|Speedshop [3] and HPCToolkit [4] with which I'm less familiar.
I'm far from a performance expert, but my impression is:
It shows the call paths to the functions and what part each path took, that's not so obvious from the typical table. On the other hand, finding functions that are called quite a lot all over the place and add up is easier in the table, so it's not become useless.
> If you want to profile a multi-threaded application, you must give an entry point to these profilers and then maybe merge the outputs.
It basically boils down to (currently) doing multiprocessing profiling is a giant pain in the ass, you have to manually attach the profiler yourself if you ever launch another process, and every profiled process produces it's own output file.
It's not impossible, it's just very annoying. I've been vaguely meaning to write a thing which attaches to the fork() call and automatically starts the profiler in the child-process, and handles aggregating all the results back to a single output when all children exit.
Multi-process (<~1M) profiling is obviously bread and butter in the HPC world. That's what the tools I referenced are for primarily. The more recent Python targeting may not be so solid, especially if there's no good launch framework to hook into, which would be a good reason for using MPI.
Hah, and I'd love something like this for Common Lisp.
SBCL's sampling profiler is pretty finicky, and I haven't figured out yet how to use Linux-level profiling tools to get something useful out of a CL image.
And as a slightly different take than that of the person posting the issue - interfaces like kcachegrind are a pretty clunky (if powerful, in their clunky way) - the profiler coming with some built-in presentation and reporting of its own like the flamegraph and the realtime display is a big win and a serious deficiency in most python profilers.
Does anyone know if there is something like this for Nodejs? Of course you can enable profiling, but it would be nice to be able to look at running processes.
Gave it a try, and wasn’t expecting to have to execute via sudo on macOS. I almost never use sudo, so this stands out as quite unexpected for a developer tool. Is there something off with my system, or are sudo privileges always going to be required on macOS?
Edit: I pip installed it into a virtualenv, if that matters.
Minor feature request: an explicit 'pause' button would make it easier to copy file paths from the output. Ctrl-S is a reasonable alternative, but it's a little hacky.
Also, it would be nice to somehow eliminate time spent in poll() from the results. I'm profiling a server process and 99.9% of the time is spent in a poll function. Perhaps there could be an option to disregard time spent in system calls rather than user code. Most of the time I'm interested in profiling only user code.
(Actually, I've been looking for a reason to play with Rust code. Maybe I'll try to add these features myself!)