
Show HN: StackImpact – Python Production Profiler: CPU, Memory, Exceptions - dmitrim
https://github.com/stackimpact/stackimpact-python
======
galonk
I got excited but then read that it's some kind of cloud-based web application
thing.

Is there something like this (show memory use and call times for a Python
process) that just runs on my computer to help me profile a long-running
Python process?

~~~
alex-
psutil ([https://pythonhosted.org/psutil/](https://pythonhosted.org/psutil/))
is awesome for collecting valuable monitoring information.

pyrasite ([http://pyrasite.com/](http://pyrasite.com/)) will let you inject
code into a process. This can be used to add monitoring of private internal
state etc (if you have no other options).

If you want to have locally hosted graphs then grafana and influx are my
current tools of choice.

It is going to be more work than swiping a credit card, but not a crazy
amount.

~~~
willangley
pyrasite can make the child process slow down markedly; this came up for me
when injecting profilers into cinnamon-screensaver to try to root cause a
memory leak
([https://bugs.launchpad.net/linuxmint/+bug/1652489](https://bugs.launchpad.net/linuxmint/+bug/1652489))

cinnamon-screensaver would take multiple seconds to lock the screen even after
I'd stopped profiling and exited the interpreter I'd injected, and I wound up
restarting it so I could lock my screen quickly again.

I don't know why this happened, but it's enough to make me think twice, and
I'm definitely going to double-check my process is still performing as I hope
after injecting it with pyrasite in the future.

------
metalliqaz
This is an interesting project. It appears to me, based on the README and the
name, that it was primarily intended to profile backend services in a web app.

I have a question for hacker news. Does Python still have a lot of momentum in
this area? I love Python and use it whenever I can, but I find these days that
most web frameworks assume right off the bat that you are using Node. The
frontend landscape is so heavily tilted towards using js (and tools such as
npm, etc) on the backend that fitting it into a Python flow is difficult,
especially for beginners. In addition, we have the relatively fresh trend of
using isomorphic code on both Node and the client. It seems like my beloved
Python is being pushed to the background. Is there any truth to this? I would
very much like to keep investing my energy into what I know, but if it is
wasted effort, I will stop.

~~~
cookiecaper
IMO having some Node.js experience and React experience are very helpful to
get hired into any position that is remotely near the web these days. I have
seen several of my clients introduce Node.js and of course everyone is
introducing SPAs in some form or another, mostly React. Mozilla is even
integrating React into Firefox's UI.

I am personally skeptical of these trends, but my skepticism doesn't change
the shift of the industry. My advice would be to get some Node and React
experience under your belt so that you can at least discuss it intelligently,
and it shouldn't be too much of an impediment moving forward.

Python is in a tough spot for growth, IMO. The new generation of languages
have internalized much of what made Python great, while leaving behind a lot
of the inadequacies and cruft attached to CPython.

Like you, I will always have a soft spot for Python, but it's getting
increasingly difficult to continue to see it as the default choice for new
projects (outside of a few specific niches).

~~~
metalliqaz
Thanks, good stuff to think about.

Do you think React is something to learn together with Node or would you
recommend just getting to know React on its own?

~~~
cookiecaper
Bolting React onto an existing codebase is probably best because it is more
similar to what you'd see in the real world. Most people don't start a
React/Node thing at the same time; they'll start integrating React into their
frontend early because they can get little React-compatible widgets plugged in
more easily than they can introduce backend changes like finding opportunities
for Node.

React and Node are not really related other than they're both JavaScript-
based, so the skills aren't really co-dependent. Node is used to execute
JavaScript locally (in build tools like Webpack, for example), but beyond a
small amount of local scripting for builds, they don't really touch (afaik; I
have not yet completed a major project with either of them, just used them
here and there).

------
LaurensBER
Seems really interesting but a lot of companies are not willing or not able to
send this to an unknown un trusted party.

Would it be possible to host this on premise?

Sentry seems to do quite well with a business model where customers are free
to host it on premise. That might be worth a consideration.

I for one am interested but for me to become a customer I would first need to
be able to trail it on my staging environment. Providing a docker container
that I can host on premise would go a long way towards being able to do that.

~~~
dmitrim
There is no on-prem offering yet, since there was actually no demand/requests.
At least with the Golang agent, which was introduced first. With Python agent
we will reprioritise it. Thank you for the feedback! (Disclaimer: I work at
StackImpact)

~~~
drcongo
Also, it feels a little light on documentation and functionality for the
Python agent considering it's the same cost as using the Go agent. It's hard
to tell if I'm actually going to learn anything about what's causing the
memory leaks in my app as a lot of the functionality seems to be Go only.

------
eddd
I created a very similar project:
[https://github.com/fieldaware/liveprofiler](https://github.com/fieldaware/liveprofiler)
a few months ago.

Now, I am surprised I didn't push it forward.

------
rcarmo
This interests me a lot because I'm using Azure App Insights (full disclosure:
I work at Microsoft) after a couple of years of New Relic and I'm constantly
looking for better takes on the "let's instrument this code and profile it
remotely" thing, especially around gevent and asyncio (which have their own
little challenges).

I've been thinking about building my own using Prometheus as
collector/visualiser. Time hasn't been on my side, but eventually...

~~~
j_s
Is there a clear leader in the commercial space? It sounds like New Relic is
the most frequently mentioned, maybe I'm asking who's #2?

I am particularly interested in who best supports .NET on Windows.

~~~
ideaoverload
Try Dynatrace. Disclaimer: I used to work for them.

------
Sean1708
_> The agent overhead is measured to be less than 1% for applications under
high load._

Do you have the methodology and data that you used to obtain this figure?
Because to be honest I'm quite dubious, especially for an app which is CPU
bound.

~~~
dmitrim
We are measuring both, individual profiler overhead when active (printed by
the agent in debug mode) and total CPU and memory overhead of the app running
over long periods of time with and without agent.

~~~
orf
Are these apps under load? Is there really only a 1% difference when running
apache-bench or seige on the applications?

~~~
dmitrim
Yes, the apps were under simulated CPU load, memory allocations, etc. The good
thing with sampling profilers is that overhead stays relatively stable even
under high load.

------
rburhum
Finally a NewRelic competitor... Their prices are killing me at scale

~~~
sciurus
There are plenty of New Relic competitors. Datadog and AppDynamics both have
APM products that support python, for example.

The feature set between this and New Relic is quite different. To
oversimplify, New Relic works at the python library level, and StackImpact
works at the python interpreter level. The functionality is potentially
complementary.

------
mooneater
I've been trying ways to profile my django code on my dev server. Its using
runserver and postgres on virtualbox (ubuntu in ubuntu) and takes 20s to
display a page. This is not due to slow db queries, those are quick. strace
says its making a huge number of calls to: futex(0xe9d550, FUTEX_WAIT_PRIVATE,
0, NULL) = 0 I tried debug_toolbar, Silk, yet-another-django-profiler, these
dont give me insight into where all that time is going and where those mutex
calls are coming from.

Would this help? Any other suggestions?

Edit: Exact same code is hugely faster on a webserver in production. And its
not the Vbox specs, i gave it lots of RAM and 4 CPUs.

~~~
jchw
My guess would be the filesystem, especially if you're using Vbox Shared
Folders (either directly or through Vagrant.)

If you're on Mac or Linux, you can massively reduce the amount of filesystem
overhead by using Docker (or Docker Compose) for local testing, since on Linux
it'll get direct access to the FS and on Mac it will use the special osxfs
driver. You can also try using nfs to mount your drives instead of vbox shared
folders if you want a quick gain, but it will make hot reloading even less
reliable.

You may also want to be sure your settings are really the same. Does DEBUG=0
change anything? What cache backend are you using? Etc.

Finally, if none of the above helps you can try a move of desperation: try to
get the app working on your native OS with no container or VM layer.

~~~
mooneater
Wow your suggestion really helped: Debug toolbar itself was causing this for
some reason. How embarrassing. Thanks!

~~~
jchw
That is rather surprising, because I don't have the same issue with a pretty
large production app. I wonder if there's a deeper underlying issue.

~~~
mooneater
Ok I upgraded toolbar from 1.7 to 1.8 and problem is gone :)

There are a number of toolbar bugs at [https://github.com/jazzband/django-
debug-toolbar/issues/943](https://github.com/jazzband/django-debug-
toolbar/issues/943) involving keywords "slow" and "hang", not sure which it
was.

But the meta-issue is, the profiling tools I used were not super helpful in
discovering what the issue was.

------
opticalfiber
Can you comment on how compatible this is with asyncio-based applications?
Looks like an interesting product and I'm going to try it out regardless, but
it would be cool to get some clarity since I couldn't find anything in the
docs besides this:

> Time (blocking call) profiler supports threads and gevent.

(from [https://stackimpact.com/docs/#getting-started-with-python-
pr...](https://stackimpact.com/docs/#getting-started-with-python-profiling))

~~~
dmitrim
We haven't tested the whole agent with asyncio applications yet. I guess only
CPU profiler was tested during development. We'll do and include it in the
docs. For now, if you see any problems, please just open a ticket. Thanks!

~~~
opticalfiber
First! [https://github.com/stackimpact/stackimpact-
python/issues/1](https://github.com/stackimpact/stackimpact-python/issues/1)
:D

------
iamnewhereqwer
There's another nice profiler for Python:
[https://github.com/nvdv/vprof](https://github.com/nvdv/vprof) (no history
view though)

------
Oras
How's it different from DataDog? considering you have same price tag

~~~
dmitrim
StackImpact is a set of profilers, which continuously sample production
applications at low-overhead. The result is line-of-code precision, not just
application-level metrics. I think it doesn't really compare to monitoring
tools such as DataDog. However, it also sends metrics as well (cpu, memory,
GC). (Disclaimer: I work at StackImpact)

~~~
sciurus
A more relevant comparison might be with their APM product.

[https://www.datadoghq.com/apm/](https://www.datadoghq.com/apm/)

------
activatedgeek
Now that this has come up, can somebody explain me how do profilers work? My
main concern being regarding the overhead to the process it adds.

~~~
alex-
PyCon 2017 had a really good talk about debuggers [1] which covered how PEP523
[2] is making debugging python 3.6+ code much faster. I think that a profiler
is somewhat similar, however instead of, potentially, stopping execution on
each line it is collecting data.

[1]
[https://www.youtube.com/watch?v=NdObDUbLjdg](https://www.youtube.com/watch?v=NdObDUbLjdg)
[2]
[https://www.python.org/dev/peps/pep-0523/](https://www.python.org/dev/peps/pep-0523/)

~~~
activatedgeek
Interesting. Thank you for the link.

------
fgarnier
Are there any plans for porting this to Android? For example by using Kivy?

~~~
dmitrim
Current agent and the dashboard are designed for long-running applications,
such as servers or scripts. There are no plans for end user devices yet. But
because the agent is pure Python (it just relies on some system specific
functionality, such as signalling), it could work with a few tweaks.

~~~
fgarnier
Fair enough. One use case I could think of for this is testing games for long
periods on Android, and use this tool to find out where the bottleneck is if
there is any. Though given that this is designed for long running apps I can
see that it wouldn't be useful for quick profiling but rather for production
testing (QA stage etc). There's something similar to that already called
Gamebench but it isn't as detailed as this, so was keen on knowing whether
this tool would make it to Android :)

------
jedi_stannis
How would I configure StackImpact to run on my celery workers?

~~~
dmitrim
We haven't tested it with celery yet. It looks like it should work. gevent is
supported by blocking call profiler, and CPU and memory profilers as well as
exception and metric reporting are library independent.

------
baq
on premise pricing please

------
davidf18
...

~~~
craigds
wrong thread?

