Hacker News new | comments | show | ask | jobs | submit login

Phusion Passenger author here.

The article mentions that Unicorn's out-of-band garbage collection is problematic because the way it works - running the GC after every request, and requiring turning off the normal GC - is overkill. But there the community is working on a better solution.

In particular, Aman Gupta described this very same problem and created a gem which improves out-of-band garbage collector, by only running it when it is actually necessary, and by not requiring one to turn off the normal GC. Phusion Passenger (even the open source version) already integrates with this improved out-of-band garbage collector through a configuration option. This all is described here: http://blog.phusion.nl/2014/01/31/phusion-passenger-now-supp...

Just one caveat: it's still a work in progress. There's currently 1 known bug open that needs reviewing.

Besides the GC stuff, Phusion Passenger also has a very nice feature called passenger_max_requests. It allows you to automatically restart a process after it has processed the given number of requests, thereby lowering its peak memory usage. As far as I know, Unicorn and Puma don't support this (at least not out of the box; whether there are third party tools for this, I don't know). And yes, this feature is in open source.




Based on what I've seen in NodeJS, isn't it about time that Ruby has some kind of temporary "Buffer" class that represents data in a way intrinsically different from String?

This would allow explicit clearing of the data in a way that would break how String works, but in the context of Buffer it would be allowed.

If the Ruby GC isn't cutting it for you, maybe old-school memory management is the way to go, right?


From the article, the issue is that the Ruby GC is triggered on total number of objects, and not total amount of used memory.

The article claims this got worse with the new generational GC algorithm, which sought to minimize the amount of work the GC has to do during the collection pause. By marking long lived objects as probably OK, you end up with fewer objects to free per GC cycle.

The problem then, is if you allocate some massive strings, and they get marked as "old" then they might never get collected by the GC. It's not clear to me if the article author sees it this way, but apparently there is some kind of bugs where some class of objects get marked "old" by accident, and it sounds like it's exacerbated by existing interpreter architecture.

Work on the Ruby interpreter is weirdly silo'ed off and mostly done by Japanese developers, so there's a significant barrier to entry for any enterprising C developer to roll her sleeves up and get hacking.

Anyhow, so, my understanding of your question is no, it would not help outside of really specific optimizations? It depends on what this Buffer class would do. Would they let you mark them as being young? Or avoid extra malloc calls because you have special information on the size of your strings? Kinda depends on how the generational algorithm was implemented? At this stage my knowledge grows thin.


> From the article, the issue is that the Ruby GC is triggered on total number of objects, and not total amount of used memory.

Which, it should be noted, has actually always been a bit of a problem with Ruby's GC. Particularly when it comes to C extensions that can't or won't use certain patterns to hint to the VM about how much memory they're really using.

I remember a really common memory issue with ImageMagick's extension along these lines back in the 1.8.x days. ImageMagick allocates huge objects and gave the Ruby VM very little insight into that fact because it used its own malloc. You'd wind up with a perfectly normal, in terms of memory use, Ruby app that didn't trigger GCs often enough and the IM objects wouldn't get cleaned up in a timely way, so it looked like you had a huge leak. You'd then spend days trying to figure it out.


    Work on the Ruby interpreter is weirdly silo'ed off and mostly done by Japanese developers, so there's a significant barrier to entry for any enterprising C developer to roll her sleeves up and get hacking.
This is wrong. Ruby Developers welcome contribution in any form. Also, they have various resource to get started:

    Official Contributing Guide: http://ruby-doc.org/core-2.1.1/doc/contributing_rdoc.html
    Ruby Hacking Guide: http://ruby-hacking-guide.github.io/
    Book on ruby internals: http://www.amazon.com/Ruby-Under-Microscope-Illustrated-Internals/dp/1593275277
    RubySpecs: http://rubyspec.org/
Further, incase you are stuck. you can post on the mailing-lists. someone will surely help you get started.


It's entirely possible that it's changed in the last few years, but at the very least what he said was once very true. There has traditionally been a very real and very painful language barrier to the ruby core team.

But, to be fair, it's a bit of a goose and gander kind of situation. People everywhere else in the world have to deal with that kind of situation all the time.


It has, very significantly. There is still a 日本語-only mailing list (ruby-dev), but the English one has significantly more traffic (ruby-core). No decisions are made in ruby-dev that are not also discussed in ruby-core.

In addition, it's only that mailing list that's split; the bugtracker is in English, the help is all in English (with one or two 日本語 translations).

It's actually never been easier to contribute to Ruby.


That's good to hear.


I'm talking about side-stepping the GC engine completely by allowing a form of quasi-manual control over buffer data. Being able to call a method that actually immediately releases buffer data would, I think, help considerably, rather than hoping and praying that the GC eventually gets around to clearing it up.

It might also be possible to create a sort of "weak reference" String-type class where you can manually ditch the data associated with it and not wreck other references to it, they'd just revert to empty string.

Seems like this could be done without having to get down and dirty in the GC itself.


There's nothing really stopping you from doing this with a cext or just using a String with all its encodings set to BINARY. I don't really see why it would be helpful here, though. The Ruby VM is essentially abdicating the difficulties of managing a large block heap to the platform's implementation, this wouldn't really change with a buffer class that could be similarly large.


> running the GC after every request, and requiring turning off the normal GC - is overkill.

Sort of reminiscent of Erlang's per-process GC. Although, given what I've seen of Ruby's internals, I'm sure the similarities stop at the very highest conceptual levels.


What's the known bug?


The project appears to be called gctools: https://github.com/tmm1/gctools

And the known bug is probably "Allow dry-running the OOBGC": https://github.com/tmm1/gctools/pull/5

Without knowing anything about the project, it seems like the bug is particular to Passenger's use case.


The bug, is not a "bug" its a feature missing to allow running with passenger. It works fine with unicorn.





Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: