
Debugging memory leaks in Ruby - frostmatthew
http://samsaffron.com/archive/2015/03/31/debugging-memory-leaks-in-ruby
======
est
Well, in Python world we did just like what OP described: use uWSGI to respawn
a new worker after certain number of requests.

Edit: alright, I got downvoted for speaking a fact. Don't get me wrong, OP's
article is great, but not everyone has control over the underlying code, nor
has enough time to tinker into runtime nuances.

To confess I have maintained >100K LoC Python/Ruby projects, yet the respawn
way is the most effective and easiest mitigation method available and I think
it's likely to continue to be in a very long time.

~~~
gry
Lipstick on a pig. Exactly what the author calls out:

"Sadly, most Ruby devs out there simply employ monit , inspeqtor or unicorn
worker killers. This allows you to move along and do more important work,
tidily sweeping the problem snugly under the carpet."

The author describes a mindset to solve the underlying problem. Endemic to any
language.

~~~
mercurial
"I'm not a real programmer. I throw together things until it works then I move
on. The real programmers will say "Yeah it works but you're leaking memory
everywhere. Perhaps we should fix that." I’ll just restart Apache every 10
requests."

 _Rasmus Lerdorf_

More endemic in some languages than others.

~~~
codinghorror
I sourced that because I was curious... these quotes are terrifying:

[http://en.wikiquote.org/wiki/Rasmus_Lerdorf](http://en.wikiquote.org/wiki/Rasmus_Lerdorf)

I mean, I'd be first person to tell you that I am a scripter at heart, but I
would never, ever say "I don't care about this [computer science] crap at
all." I feel like my job is to defer to the many programmers that are smarter
than I am. Including the parser and compiler and linker etc

~~~
tormeh
"We have things like protected properties. We have abstract methods. We have
all this stuff that your computer science teacher told you you should be
using. I don't care about this crap at all."

I kinda agree with that sentiment. There's a lot of unnecessary stuff going on
in CS. We had extreme OOP and I fear extreme FP will spread into the
mainstream. The mission of programming language designers should be to help
programmers do their jobs better, not design something
elegant/interesting/powerful/expressive/[buzzword]. Those things are all good,
but only in the service of programs with fewer bugs and shorter time-to-
market.

~~~
the_af
> _The mission of programming language designers should be to help programmers
> do their jobs better, not design something elegant
> /interesting/powerful/expressive/[buzzword]. Those things are all good, but
> only in the service of programs with fewer bugs and shorter time-to-market._

That may be true for programming language designers, but I'm afraid you
misunderstand what CS is all about. Hint: it's not about time-to-market. The
job of CS is to explore the theory -- precisely how to do something in
interesting/elegant/expressive ways. Its job is to explore the theoretical
_and_ practical underpinnings of computing; to provide proofs; to explore the
abstract and formal systems behind software. CS is _not_ necessarily directly
applicable to enterprise software (though of course CS research does have
practical applications; and of course there is overlap between theoretical and
practical computing).

A lot of cognitive dissonance comes from people who look at CS thinking it's
what they need in order to write software.

------
shanemhansen
Sadly many ruby, python, and php devs will just use process supervisors as a
bandaid to the problem. I understand that some people who aren't really
programmers need to run a few scripts sometimes. For them supervisors make
sense. Your average shared hosting environment running wordpress with some
random custom plugins will probably benefit from a max of 10 or 100 requests
per child.

For anyone who's job description involves writing software and owning it's
performance and reliability, process supervisors are a crutch. A crutch to be
used only until the memory leak is found and a patch submitted to the relevant
project.

A former colleague used to refer to the concept of software entropy. Like
physical entropy, it tends to increase. It's both natural and inevitable. But
as engineers we sometimes, for brief windows of time, have the ability to
impose order on chaos. To locally decrease entropy.

I've seen the end result of relying on process supervisors to work around
frequent crashes and memory leaks. The result is that you end up with ruby
processes with multigigabyte heaps, resulting in a stack that can only run a
few dozen requests/s even on the beefiest of hardware.

To be clear, I'm not saying don't ever use process supervisors. It's just that
if they are saving your bacon multiple times a day something's wrong.

I'm also not claiming that inefficient software will cause your company to
fail. It's perfectly possible that your buffaloak architecture will keep
running until your company dies of natural causes, but I believe that we
should have enough professional pride to fight entropy, even if we're the only
ones who know what we did.

------
s0l1dsnak3123
Here's a Rack middleware I just wrote based on this article to use with
Datadog (and dogstatsd):
[https://gist.github.com/johnhamelink/cbef04581da5c3dd90be](https://gist.github.com/johnhamelink/cbef04581da5c3dd90be)

------
LaSombra
That's one of the things I like about the Java ecosystem. The tooling is very
mature. Heap dump analysis is easy and being able to connect to a live JVM,
either using JMX/remoting, Thermostat or Oracle's Java Mission Control, makes
it almost painless and easy, almost, to look at your JVM internals and see
what bottlenecks you might be hitting.

I don't know how to get the same kind of information on Ruby, Python or
node.js but I'm pretty sure it might involve some gdb and debug symbols. No
idea.

------
mattetti
Excuse my pedantry but these aren't technically memory leaks, great article
nonetheless :)

~~~
mikecmpbll
I'm curious what you mean by this!

~~~
cheald
Memory leaks are technically when you allocate memory, then lose the reference
to it without freeing it, thus leaving the memory permanently allocated
without a way to free it. Outside of a serious VM bug, you won't leak memory
in a GC'd language; instead, you can leak references (leaving objects alive
and uncollectable even after you've finished using them, because you've left a
reference to the object laying around somewhere without intending to). The
memory is still reclaimable, if you can eliminate the references keeping the
object from being GC'd.

This really is just pedantry though, because the net effect is "the memory
usage chart keeps going up and to the right" in both cases. "Memory leak" is a
fine enough term for what's happening in this case.

~~~
fnord123
They are both memory leaks. Unneeded memory is not being reclaimed. It's not
even pedantry. It's just incorrectly applying one type of problem to the whole
range of problems captured by the term.

~~~
innguest
Relax. It's not pedantry. If I click on a link titled "Memory leak in Ruby"
I'm concerned the GC has a problem and I should avoid some particular method
call or something else because the GC is currently having trouble collecting
something in particular. That would be a memory leak.

The Full Article is talking about handing over data with references to live
objects to some library that doesn't let go of those references, it seems.
That is a space leak and can and will happen anywhere in any language
regardless of whether you're using a perfect GC or not.

So the difference is this:

Ruby has a memory leak -> not my fault, I'll just stop using the offending
method that leaks memory, thanks article!

Someone caused ruby to space leak -> duh, it's your fault, no need to write an
article about it, you should be letting your references be GC'ed if you don't
need them anymore, learn that memory is a scarce resource, etc.

Totally different problems. I was very confused reading the comments here
because people are afraid of being called a pedant. Folks, our trade comes
with a jargon. If you don't learn it you won't know what you don't know.

Saying simply "unneeded memory is not being reclaimed" is not constructive.
Why is it not being reclaimed? Because the reason matters to programmers, we
invented different names for those occurrences.

~~~
fnord123
>Saying simply "unneeded memory is not being reclaimed" is not constructive.

Yes it is. It's a description of a problem your monitoring tool has reported
that needs investigation.

> Why is it not being reclaimed?

That's the first step of the investigation of the memory leak.

~~~
innguest
You can have space leaks without memory leaks (if you build up big enough
thunks/delayed computations/promises or otherwise run out of heap space) and
you can have memory leaks without space leaks (malloc 1 byte and lose its
pointer).

It's OK if you don't understand the difference yet - you'll get it eventually
- but there is a big difference.

~~~
fnord123
>It's OK if you don't understand the difference yet

Oh I understand the difference. But losing track of memory and accidentally
keeping memory in play are both called memory leaks. I would still call it a
memory leak if it's due to cyclic references or stale entries in a hash table
or other container.

------
innguest
> Considering that each entry could involve a large number of AR objects
> memory leakage was very high.

Ah. I thought I was going crazy. Memory leaks in a managed language (i.e. one
with a GC) are solely the fault of the GC implementer and cannot be fixed by
the language user.

What is being talked about here is space leak.

