
Monte Carlo profiling - mapleoin
http://stackoverflow.com/questions/3045556/how-to-profile-my-code/3068045#3068045
======
jgrahamc
It doesn't have to be a slow function either, it could just be a function
that's getting called a ridiculous number of times.

Back at Electric Cloud I was working on debugging an apparently live locked
build process using our tool (this was back in 2004) where make would run
forever with the CPU pegged.

It turned out that the particular build tree structure (at the file level) was
somewhat pathological for our tool and we were traversing the tree a
ridiculous (exponential) number of times relative to its size. We were doing
this to constantly calculate a value that depended on sub-tree size.

After ages trying to narrow down the bug I just went into the debugger and
broke in to look at a particular data structure. After doing this a few times
I noticed that I was always in the same function.

Memoizing the function fixed the problem.

------
saurik
This is how common profiling tools (such as gprof) actually work.

"Profiling also involves watching your program as it runs, and keeping a
histogram of where the program counter happens to be every now and then."

"The run-time figures that gprof gives you are based on a sampling process, so
they are subject to statistical inaccuracy."

\-- <http://www.cs.utah.edu/dept/old/texinfo/as/gprof.html#SEC11>

~~~
CountHackulus
This is also (roughly) how profiling JIT compilers (like most JVMs) work. The
method that needs the more expensive optimizations applied to it is the one
that gets the most samples.

------
wulczer
Also known as Poor Man's Profiling (see <http://poormansprofiler.org/> or
[http://webcache.googleusercontent.com/search?q=cache%3Apoorm...](http://webcache.googleusercontent.com/search?q=cache%3Apoormansprofiler.org)
as it seems the site's down).

It's a pretty good technique, actually.

------
rdtsc
$ pstack PID

Combine with watch

$ watch pstack PID

On RHEL6 pstack is in gdb RPM package.

(bonus: shows all threads)

~~~
jbert
Yes, I've used this and it's great for anything where you want a C-stack
backtrace.

As a variant, if you want a language-level backtrace in an interpreted
language, you can:

\- set up a signal handler which dumps a (perl, python, etc) level backtrace

\- hit the program with the signal a few times

~~~
rdtsc
Great idea!

My Python server processes during testing handle 2 signals SIGUSR1 & SIGUSR2.

SIGUSR1 : runs guppy memory usage & leak profiling (<http://guppy-
pe.sourceforge.net/>)

    
    
        import guppy
        _hpy = guppy.hpy()
        log(...,_hpy.heapu(),...)
        log(...,_hpy.heap(),...)
        log(..., <other _hpy info>...)
      
    

SIGUSR2 : open an interactive console using rconsole
(<http://code.google.com/p/rfoo/>) and let me log and muck around with live
object while the server process keeps running.[NOTE: don't use on production
systems, it opens an RPC port that lets anyone control your process remotely!]

    
    
            from rfoo.utils import rconsole
            rconsole.spawn_server()
    

Then on the console:

    
    
             $ rconsole

------
jsnell
I can't help but boggle at the suggestion that in this day and age there are
programming environments where doing this is less work than running a real
profiler. I mean, come on, profilers that can't handle recursion or that can't
show higher than function level granularity?

~~~
lutorm
Well, it means compiling with profiling, running, visualizing the output, and
then figuring out the hotspots. That takes some amount of work. (Or you run
with valgrind, in which case it runs so painfully slow that before you've
gotten to the slow part, you've gone home...)

Compared to that, firing up the debugger and hitting break is pretty easy. If
the hotspot is like core meltdown hot, it's probably faster.

~~~
jsnell
Plenty of tools don't require recompiling anything for profiling. Running the
program once for profiling seems like it should be less work than running and
interrupting it 10 times (heck, a lot of tools won't even require restarting
the program if it already happens to be running). Visualizing the output by
function or by something more fine-grained should be a matter of a single
command. And if it's easier to manually find the hotspots by looking at a
bunch of stacktraces than the profiler output, there's something badly wrong
with said output.

Now, I'm sure there are special circumstances where this method is the right
thing. But that's totally different from it apparently being suggested as best
practice to newbies.

------
cdavid
There are actually ways to profile python functions on a per-line function
(e.g. the line profiler from Robert Kern:
<http://packages.python.org/line_profiler>)

------
lini
Wow, I didn't know this technique had a name :) I used it today trying to
figure out why a SharePoint application was eating up the CPU - break into
debugger 6-7 times and 3 of those happened to be in the same function call or
its children in the stack trace - problem found!

~~~
wnight
My team was writing a Linux-based PABX and a round of changes had caused
lockups - but a weird kind. The machine was there, but just not responding. It
wasn't dead, just maybe 100K times too slow. But when I hit alt-sysrq to shut
the machine down and it responded instantly.

So I wrote an Alt-Sysrq handler that dumped the stack of the running process.
The failed one was usually the one running so it was pretty easy to find, and
then the stack trace told us enough about why it was deadlocked to make a fix
without needing more investigation.

It was essentially just a one-sample sampling (monte-carlo) profiler.

------
ot
It also works to find out in which loop a process is stuck in, and in this
case with 100% accuracy :)

------
eric-hu
For people that have done this with success: how many times of closing an app
do you consider a "few" times? Wouldn't you need to repeat this enough to make
the result statistically significant--or is it more like a rule of thumb?

------
ddlatham
For Java, running jstack a few times can be much faster and easier than tying
up to a profiler and often will show your bottleneck or problem right away.

------
abhijitr
aka "sampling profiler" as opposed to "instrumentation profiler"

