Hacker News new | comments | ask | show | jobs | submit login
Show HN: Py-spy – A new sampling profiler for Python programs (github.com)
286 points by benfrederickson 4 months ago | hide | past | web | favorite | 43 comments



Many thanks for building and releasing this. It's ridiculously easy to install (especially in virtualenvs) and very powerful. When I push '3' or '4', I get informative, stable output.

Minor feature request: an explicit 'pause' button would make it easier to copy file paths from the output. Ctrl-S is a reasonable alternative, but it's a little hacky.

Also, it would be nice to somehow eliminate time spent in poll() from the results. I'm profiling a server process and 99.9% of the time is spent in a poll function. Perhaps there could be an option to disregard time spent in system calls rather than user code. Most of the time I'm interested in profiling only user code.

(Actually, I've been looking for a reason to play with Rust code. Maybe I'll try to add these features myself!)


Thanks! Both of your suggestions totally make sense. I've created an issue to track the poll() issue here https://github.com/benfred/py-spy/issues/13 - I think that should be an easy fix.


I think a more robust solution might be to have counters for in-Python samples, outside-Python, and in-syscall.


The adaptations of HPC-type performance tools to Python and called non-Python, specifically parallel, libraries might be of interest:

TAU: https://www.cs.uoregon.edu/research/tau/docs/newguide/ch03s0... Extrae/Paraver: https://www.researchgate.net/publication/317485375_Performan... llel_Python_Applications Score-p/Scalasca: http://score-p.org https://github.com/score-p/scorep_binding_python


The localhost talk for rbspy (the inspiration for this project) is awesome: https://www.recurse.com/events/localhost-julia-evans

Imagine it should also provide relevant insights into the structure of this tool


This is really fantastic. I just managed to find a 3x speedup on a compute heavy job I run using Py-spy - there was an unneeded hotspot in a library I'm using that I didn't previously suspect. It would have taken a long time using kernprof to dig in through the call stack to find the issue.


This is a really great tool - the kind I didn't even know that I needed until I was given it!

One question, does anyone here know how to interpret the GIL (Global Interpreter Lock) percentage display in the top-left? In my code, "Active" sticks nicely at 100%, but the GIL jumps around from 1% to 100%, changing on every sample.

edit: Now that I think about it, my code spends a lot of time in C API calls - maybe the GIL is released there?


Wow, it tracks GIL contention? Major feature for me. I use a lot of Numba, and it releases the GIL if you want — so threading is actually useful.


I've been enjoying pyflame from Uber - which the author quotes in their info

> The only other Python profiler that runs totally in a separate process is pyflame, which profiles remote python processes by using the ptrace system call. While pyflame is a great project, it doesn't support Python 3.7 yet and doesn't work on OSX or Windows.') > Py-spy works by directly reading the memory of the python program using the process_vm_readv system call on Linux, the vm_read call on OSX or the ReadProcessMemory call on Windows.

I think ptrace is fundamentally letting you do the same thing in terms of how pyflame is using it.. and the same ptrace access permission governs whether you can use process_vm_readv

The real win for this project is a real-time "top" or "perf top" style UI instead of only generating flamegraph output. I love that feature, and will be particularly good for quick shot "what is this process doing" type info as opposed to specifically profiling some timeframe to analyse the resulting flamegraph (which is all pyflame let you do)

Nice work!


"top" for python programs. Thats pretty awesome - not sure if this has existed in other traces, but the output is great.


Rbspy, the inspiration for this tool, has a similar default output, which you can see in the documentation, https://rbspy.github.io/.


If you're talking about the flame graphs, they're a fairly common feature of modern profilers. The oldest implementation I know of is at https://github.com/brendangregg/FlameGraph.


This is the same script. The Rust code is just invoking Perl to generate it.


I've never understood why flame graphs are better than the normal presentation of inclusive and exclusive timings in performance tools, even if they're not "modern", but embody some decades' experience. Anyone care to explain?


Flame graphs have extra information in the form of seeing the caller and callees. That, and I find the graphical presentation to be easier to scan.


Of course I expect to see the call tree, and to flip between that and a flat view, to flip between different metrics in the views, and to see them across processes/threads. I'm talking about graphical tools (perhaps coupled with external reduction of the data) like CUBE [1], Paraprof [2], and those in toolsets like Open|Speedshop [3] and HPCToolkit [4] with which I'm less familiar.

1. http://www.scalasca.org/software/cube-4.x/documentation.html

2. https://www.cs.uoregon.edu/research/tau/docs/newguide/bk01pt...

3. https://openspeedshop.org

4. https://hpctoolkit.org


I'm far from a performance expert, but my impression is:

It shows the call paths to the functions and what part each path took, that's not so obvious from the typical table. On the other hand, finding functions that are called quite a lot all over the place and add up is easier in the table, so it's not become useless.


The latter can be accomplished with inverted flame graphs (sometimes called icicle graphs) which show the call stack inverted


Indeed, you want alternative views depending on what sort of profile or question about it you have -- with toggles in the GUI.


Oh that's great - I had it up and running in 30 seconds.


Does it support python multiprocessing?

Basically nothing out there does that I've found, and it's a really major pain-point for me.



> If you want to profile a multi-threaded application, you must give an entry point to these profilers and then maybe merge the outputs.

It basically boils down to (currently) doing multiprocessing profiling is a giant pain in the ass, you have to manually attach the profiler yourself if you ever launch another process, and every profiled process produces it's own output file.

It's not impossible, it's just very annoying. I've been vaguely meaning to write a thing which attaches to the fork() call and automatically starts the profiler in the child-process, and handles aggregating all the results back to a single output when all children exit.


As a heads-up if you hadn't already seen it 3.7 added fork callbacks for stuff like this - https://docs.python.org/3/library/os.html#os.register_at_for... - much nicer than patching-and-praying.


Yeah, saw that while looking about for atfork stuff.

Ironically, I was involved in https://bugs.python.org/issue6721 which is one of the major bugs leading to the acceptance of https://bugs.python.org/issue16500, which is the patch including atfork().


Multi-process (<~1M) profiling is obviously bread and butter in the HPC world. That's what the tools I referenced are for primarily. The more recent Python targeting may not be so solid, especially if there's no good launch framework to hook into, which would be a good reason for using MPI.


It says you pass it a pid, so yes


This is great!

I also took the liberty to add this (and setuptools_rust) to Arch Linux's AUR: https://aur.archlinux.org/packages/python-py-spy


I would love something like that for Ruby :)

I am envious, well done!


Check out rbspy https://github.com/rbspy/rbspy (rbspy was the inspiration for this project =)


Hah, and I'd love something like this for Common Lisp.

SBCL's sampling profiler is pretty finicky, and I haven't figured out yet how to use Linux-level profiling tools to get something useful out of a CL image.

So I second the OP, well done!


Wonderful. Can the data it produces be munged into something KCachegrind can show?


Not yet - but I'm hoping to have a version that supports this next week. Will update this issue when it's done: https://github.com/benfred/py-spy/issues/3


Great, thanks!

And as a slightly different take than that of the person posting the issue - interfaces like kcachegrind are a pretty clunky (if powerful, in their clunky way) - the profiler coming with some built-in presentation and reporting of its own like the flamegraph and the realtime display is a big win and a serious deficiency in most python profilers.


Does anyone know if there is something like this for Nodejs? Of course you can enable profiling, but it would be nice to be able to look at running processes.


Gave it a try, and wasn’t expecting to have to execute via sudo on macOS. I almost never use sudo, so this stands out as quite unexpected for a developer tool. Is there something off with my system, or are sudo privileges always going to be required on macOS?

Edit: I pip installed it into a virtualenv, if that matters.


https://github.com/benfred/py-spy#when-do-you-need-to-run-as...

tl;dr - yes, it's a limitation(?) of macOS syscalls.


Dangit. I was thinking of writing an AST inspector name pyspy.


why the dash, py-spy vs rbspy?


the name pyspy was taken on pypi already : https://github.com/tdfischer/pyspy =)


Just a guess but remove the dash and read it out loud.


?


It would rhyme with crispy.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: