

Greplin's code independence day: open source for Nagios, OOM debugging, and more - rwalker
http://tech.blog.greplin.com/code-independence-day-nagios-utilities-oom-di

======
justinsb
For the Java OOM analyzer, did you look at creating a heapdump automatically
(using -XX:+HeapDumpOnOutOfMemoryError), and then using the Memory Analyzer
Toolkit (<http://www.eclipse.org/mat/>)?

I can see that your approach is (presumably) much faster because it avoids the
need to dump all the memory, but I've found heapdump + MAT to be incredibly
powerful.

~~~
rwalker
We did, but for basically anything over 1GB we found heap dumps to be
prohibitively slow. Then, instead of failing for one request we'd basically
end up killing the whole server.

~~~
SpikeGronim
I've had good experience with 2GB heap dumps. Actually dumping it was, in my
experience, pretty quick. It was essentially as fast as the machine could do
that much sequential IO. You get a lot more detail, like what was in those
Strings that took up the whole heap. That is usually invaluable for diagnosing
what's really wrong.

Actually analyzing a multi-GB heap dump takes 2-3x as much heap for the
analyzer. The jhat tool that ships with the JVM is perfectly capable of
analyzing a 2 GB heap if it has 6 GB of its own heap to play with.

------
e1ven
Thank you! The Nagios utils look very cool. I write all my Nagios checks in
Python locally, and the startup/stop through NRPE is getting to be a problem.

I'd been thinking "Hrmm.. I bet I could use Tornado to fire these off
inline.." and Bam, here it is ;)

Seriously awesome.

------
nwmcsween
I recommend <https://www.icinga.org/> over nagios, it's simply a fork with
quite a few updates so greplin-nagios-utils should work on it as well.

------
koenigdavidmj
Getting a 404 on greplin-nagios-utils.

~~~
danicgross
give it a try now! apologies.

