
How I spent two weeks hunting a memory leak in Ruby (2015) - Whitespace
http://www.be9.io/2015/09/21/memory-leak/
======
kyledrake
Interesting post, but I did want to chime in quickly and say that it's pretty
absurd to only have 1GB for a web application in 2016, even for a small one.
This is why I've been dismissive of using Heroku for my projects, even though
I run a lean stack. They have higher RAM options, but they are incredibly
expensive. For the cost of a 14GB Heroku Dyno, I can buy a dedicated server
off ebay with 32GB ECC every single month. I don't use all of that RAM, but
it's nice to have some extra RAM to toss around when you need to in a pinch.

I get that it means you have to "run a server", and _insert arguments for
expensive cloud providers vs DIY servers here_ but I don't think it's any less
crazy than being forced to chase a GC white whale for two weeks on a tiny
memory leak to avoid a huge rate hike on your hosting bill.

On an aside, I've only had one problematic memory leak with ruby ever (the
infamously leaky RMagick), I threw this in every time I used the lib and it
solved it for me:

    
    
      GC.start(full_mark: true, immediate_sweep: true)
    

Bit of a performance hit, but not enough to cause a problem for my use case.

~~~
mikekchar
> it's pretty absurd to only have 1GB for a web application in 2016

While I don't for a minute doubt what you say, as an old guy, the absurdity of
1GB being insufficient for serving up web pages hits me pretty hard.

~~~
kyledrake
Our main proxies (that do the majority of the heavy lifting over here) run
nginx and use, what, 50MB tops? If even? I throw them on $5/mo VPS instances
and they just blast out IO without any problems.

But nginx was carefully written in C over the course of many years, with a
general goal of hauling IO and static data around from other systems. Which is
all you really need for web pages, in the end. Solved one thing and holy crap
they did it well.

It's the business logic side of the web development world where things get
nasty. The tradeoff is you get a giant chest of libraries for solving any
stupid problem you want. I didn't have to for example figure out how to do IDN
encoding for domain validation. You throw enough of that crap together in a
single process and things start to get nasty. But how long would I have spent
carefully writing all of it in C? Let's just say I'd be out looking for
another job.

Worth the tradeoff, IMHO, but I think with projects like Crystal
([https://crystal-lang.org](https://crystal-lang.org)) in the future we will
discover we can get pretty close to both in the end. That combined with
Moore's law and I think we'll be good to go.

Assuming, of course, that cloud providers don't artificially restrict
available RAM through economic constraint (know the true costs - demand
better!).

~~~
mikekchar
I realised (thanks to the other poster) that my comment may have seemed
dismissive, which was not my intent. Just marvelling at it myself :-) But I do
wonder about this kind of stuff. I have been wrestling with some legacy
systems chewing up gobs and gobs of memory lately. Then wrestling with my own
code that was chewing up gobs and gobs of memory :-)

It's pretty easy to make mistakes because we don't really get punished, as you
say. But I wonder if we might be doing ourselves a disservice by not
restricting ourselves. Generally speaking, better code is better and while it
often _feels_ slower to write, my experience has been that it doesn't always
work out that way.

I might try working in a memory restricted container to see if it improves my
code...

------
bru
About the patch at
[https://github.com/vmg/redcarpet/pull/516/files](https://github.com/vmg/redcarpet/pull/516/files),
couldn't the update simply have been the following?

    
    
        -	return Data_Wrap_Struct(klass, rb_redcarpet_rbase_mark, NULL, rndr);
        +	return Data_Wrap_Struct(klass, rb_redcarpet_rbase_mark, xfree, rndr);

~~~
asveikau
The shortest code you can get away with is not always the best. If the
parameter is a structure (which I think it is here, behind the void pointer),
it's a good idea to give it its own free function in case later you add
additional struct members that will need freeing/cleanup in the same place.
(Bonus points if it appears in roughly the same place this thing is malloc'd)

I would have been even more explicit and added a cast to the expected
structure pointer type. Entirely useless, except to the human reader.

~~~
userbinator
That sounds like premature generalisation to me.

I'd say use a separate free function _when it needs one_ , which is not the
case yet (and might never be.)

~~~
asveikau
Call it an overreaction to a past project in which a predecessor scattered
free calls over several uses, then I had to add fields.

I don't mind adding 1 useless function per complex type if it saves me those
headaches even a small minority of the time, or for the next maintainer.
Opinions may differ, but that's me.

------
samsk
shameless plug: I've made malloc hooking preload library for hunting such
bugs, that automatizes most of the described work -
[https://github.com/samsk/log-malloc2](https://github.com/samsk/log-malloc2)

~~~
nateberkopec
This looks really, really cool. You should download the bugged version of this
gem and make a screencast showing how your tool makes this process easier.

------
busterarm
What a timely post.

I had a similar issue that I was tracing last week that did end up being my
Ruby code...and it turns out I was modifying a constant like in the example.

What a fun read! I've been learning more about Ruby's GC since 2.1 and this
got me looking even deeper -- definitely picked up a couple of new
tricks/tools from this. Thank you be9!

~~~
mwpmaybe
Freeze your constants!

~~~
busterarm
I used to do this with strings (now you don't have to), but this is great
advice!

~~~
mwpmaybe
Yeah, it's one of those rare Ruby gotchas. If you assign some kind of a data
structure to a constant, you'll get a warning if you try to reassign the
constant, but the data structure itself will silently mutable. To disallow
this behavior, simply freeze the data structure. If you have nested data
structures, you must freeze deeply.

    
    
      ENUM = [0, 1, 2].freeze
    
      LOOKUP = {
        :A => [1, 2, 3].freeze,
        :B => [4, 5, 6].freeze
      }.freeze
    
      # or:
    
      LOOKUP = {
        :A => [1, 2, 3],
        :B => [4, 5, 6]
      }.tap { |s| s.values.each(&:freeze) }.freeze

------
rgtk
There is no reason for using native extensions in scripting languages like
Ruby when program load is pretty low (especially when they are used for
performance reason, not binding).

Great insight, though. Author described this experience as it was a great
venture... in hindsight I suppose :-)

~~~
scott_s
Your criticism really applies to the gem authors, which to me, does not make
sense. If I, as a library developer, want my library to be used on the
critical path in heavy-load situations, and the library is for a language like
Ruby, I'm going to figure out which part of my library are the performance
critical parts, and implement them in a native language.

~~~
haimez
You would think that you (in this case, "you" == "average library author in
the ruby ecosystem") would, but you won't. I'm fairly confident the problem is
similar, but likely worse in the javascript ecosystem.

Necessity is the mother of invention, and when you're the first person to
notice a memory leak, you're on your own.

Both ecosystems tout the low barrier to entry as a great benefit, but memory
leaks, mostly benign inefficiencies, and poor algorithmic efficiency becomes
at least par for the course if not a crippling liability when reality hits
your application like a freight train (or say a 2 order-of-magnitude spike).

------
co_dh
I had similar experience of finding memory leak in Python. It's a 64 G server,
but we load everything from database to memory. I didn't success.

This experience made me question myself: is it wrong to create a lot of
objects in memory and let them reference each other, and hope Garbage
Collector to magically work?

If instead of object, we just put data into memory more organized as a
(column) database, then we don't need GC anymore.

OOP made programming easier, by modeling real world as object. At the same
time, it made memory management harder, since it's not nature to the
hardware's memory model: a linear array of memory cells.

I wish in future, there is a place for this paradigm of in memory database
model of programming.

~~~
firebones
It's called "data-oriented design". Look into the gaming industry, where
there's a lot of anti-OOP sentiment. They're primarily coming at it from the
standpoint of optimizing memory access for cache lines, but there are side
benefits that come from it around reducing memory allocation and usage, and
avoiding the cost of abstractions that come with hidden inefficiencies.

------
allendoerfer
Is there a simple way to notice your own subtle mistakes, when writing in a
foreign language? The missing articles in texts from Slavic people bug me more
than they should and I wonder what kind of (stylistic) errors I myself make.

The English grammar is mostly just simplified German, but there are some
constructs, which I know of, but do not naturally use (e.g. "your _every_ …",
"to name _but_ two") - they somehow seem twisted to me. Another one are
compound words, I have given up on distinguishing between spaces, hypens and
actual compounds and just use spaces almost everywhere.

------
wyldfire
These are always interesting bug hunts.

I tried to add ASan to the travis config for this project but I couldn't quite
figure out how to change CFLAGS and/or CC. Never used ruby but interwebs
hinted that the bundle config/install commands might accept "\--cc" and
"\--with-cflags" commands. It's ignored when I tried it though [1].

[1] [https://travis-ci.org/androm3da/redcarpet/jobs/159711520](https://travis-
ci.org/androm3da/redcarpet/jobs/159711520)

------
pjleonhardt
That is fantastic dedication to finding the source of the memory leak. I'm not
sure I'd have quite the devotion or expertise to track that one down.

------
stretchwithme
Wouldn't it be useful if the community tested EVERY gem to see if it leaked
memory, perhaps each developer stepping up to test one, in an app that just
exercised the gem in question?

To test all of the ones we use individually is prohibitive. But thousands of
people could do test one.

------
ktRolster
A little worrisome that the fix hasn't been merged. _(edit: it 's been fixed)_

~~~
richardwhiuk
Did you get all the way to the end of the blog post?

"Update on 2015/09/29\. Redcarpet fix has been released."

~~~
ktRolster
oh, thanks. Disregard my previous comment.

------
ajsharma
Great read. What's more, I found out that my company's codebase was still
using the broken 3.3.2 version (code fix in the works).

Love it when reading HN pays off immediately like that :)

------
andrewvijay
Amazing stuff. I took time to understand this article by going through each
part in detail. Learnt quite some stuff. Thank you for this incredible story!

------
vemv
While the described debug process deserves nothing but my applause, I wonder -
could the issue have been more easily diagnosed?

Ideally much less tooling would have been needed.

------
darkhorn
Does PHP have similar issues? I think it shouldn't.

~~~
lotyrin
PHP extensions in C are pretty rare since it's so hard to distribute them
(outside a few exceptions that exist in common package management systems), so
in that sense, I guess?

But no, a PHP extension in C can have a memory leak like any other C code.
It's not at all uncommon actually, since there are fewer consequences; CGI
mode PHP cleans up after every request, FastCGI mode PHP by default restarts
workers after a number of requests because of the class of effects (unwanted
state) that includes memory leaks.

------
andrewchambers
One you identified the leaked address I think a reverse debugger could have
helped track the alloc down.

