If I understand the implementation correctly, this seems like it has a low ratio of complexity to performance improvements. It will be interesting to see if this ends up landing in MRI. I'm anxious to see some non-micro benchmarks against "real world" Ruby, which basically means the Rails test suite, start up time, etc.
My only concern would be what kind of impact this has on memory usage. Having all of those in memory copies of method implementations hanging around probably comes at a cost. But in general the world seems to be OK with trading memory for speed these days. If Ruby is twice as fast for less than twice the memory then we can easily consider it a win.
Anecdotally, it seems like servers these days have plenty of RAM to spare relative to CPUs anyway. A semi-recent Rails app that ran on AWS EC2 m4.xlarge's with 24 Puma workers only used 5.5GB of the available 16GB of RAM.
It also looks like the spirit of this optimization comes from the work Chris Seaton and others have been doing with Ruby and the Graal VM. It's great to see work like that having an impact on MRI.
"In Ruby..." [we could have]... " deoptimization. It allows us to reverse the just-in-time compilation process and go back to a simpler interpreter where all the checks Ruby needs are explicit, all the objects really exist, and we have code paths to handle anything."
It seems like Ruby has a whole lot of features that actually make optimising it very difficult (monkey patching et al), and this is their attempt to work around it.
...You know, they could just limit monkey-patching, and drop call/cc and a few other features (not having call/cc opens a lot of options), and they'd have a dramatically easier system to work with, right?
When was the last time somebody used call/cc in ruby code, anyways? By current implementation, it's only useful for thought experiments and toys, and we have Scheme for that.
In short, deoptimization seems like a case of solving one of the sinister black tendrils of the problem, not the root cause.
The problem with limiting monkey-patching is that it's used extensively and with great variety in Rails (and other popular Ruby projects).
Monkey patching, done correctly, will "settle" over time. That is, once your rails app has called all the dynamic finder methods it's ever going to use, and `require`d all of the `active_support` extensions it'll need, it's done monkey patching. Of course, there are programming mistakes that can open a huge bag of worms, such as using `OpenStruct`s. Generally, anything that will smash the global method cache[1] is also going to require the kind of `iseq` deoptimization that this pull is about (in fact, smashing the global method cache is a simple case of deoptimization).
This is why deoptimization is so useful - it's almost never going to happen after your app has settled, but it's necessary to maintain correctness while `active_support` is being loaded and adding all of those yummy methods to core classes.
`call_cc` can go to hell though - I've never seen a program that uses it ;-)
Yeah, but there's got to be a less needlessly overcomplicated way of doing this. It's ridiculous.
As for call_cc (or to use its "real" name call/cc, or call-with-current-continuation), my usual response to those that propose dropping call/cc (usually proposing shift/reset or similar as a replacement) is "over my dead body," but that's Scheme: supporting weird experiments is part of Scheme's job, and there's at least one high-quality implementation of Scheme that uses Cheney-on-the-MTA compilation, which makes call/cc fast enough for Real Work. In ruby, the situation is different: call/cc is slow, expensive, and Ruby is a language for getting stuff done, not academic experiments about control structure.
Dropping call/cc would represent a mindset that would immediately make me skeptical regarding Ruby's future. Limiting monkey-patching would make me immediately drop Ruby. The language's value to me is that I can bend it to do lots of stuff in ways I understand and control. Going away from that removes the reason for Ruby to exist.
What you call "the root cause" is literally why I use Ruby.
The thing is, I don't think that we need to kill everything to fix this. You can certainly fix the ruby compiler so that monkey-patching is fast, and you don't have to deoptimize for this sort of thing.
Call/cc, OTOH, should be dropped from Ruby. It's caused a great deal of pain for the devs, and it's not nearly as important as it is in something like Scheme. They could implement the similar (though not as cool) shift/reset, and have much of the power of call/cc, an easier implementation, and perf that's practical.
> You can certainly fix the ruby compiler so that monkey-patching is fast, and you don't have to deoptimize for this sort of thing.
Do you have a concrete idea for how to support monkey patching without deoptimisation and have it be fast? If you do, congratulations it's a major breakthrough in language research and you should write a paper.
I was kind of joking about that. Or something. Seeing as I'm not a compiler/interpreter expert, I honestly don't know what I was thinking posting that. Seriously, what was I thinking?
But I'm just thinking this is utterly nuts: Not that I'm degrading your accomplishments. If it's really that hard a problem, than the fact that you have solved it at all is impressive. But the fact that we have to go to those lengths at all is crazy.
Part of why it seems so nuts to me is probably because I don't know enough about Ruby's internals. As I understand it, A class's methods are kept in a datstructure, associating them with their identity (I think it was a hashtable at some point, although this may have changed) - A vtable, in C++ parlance. When you monkeypatch a method, you're either adding a function to the vtable, or changing which function a vtable entry points to. I don't know why you have to deoptimize to do that. past-me should have probably understood the problem better before he made claims he couldn't back about it, but there's nothing I can about it now, save apologize for being stupid, and ask for an explanation. Which is what I'm doing now.
I'm not Chris, and he's smarter than I am, but I appreciate that you posted this. More people, myself included, should be better about fessing up when we do something silly.
V-tables solve one problem - method lookup. V-tables are faster than a hash table lookup, yes, but they still mean indirect memory access (so you have to go off and chase pointers around main memory just to make your call), and they're designed for languages where you know the methods that your classes have ahead of time and you know the class of each variable. In Ruby you don't have that. You can call any method on any object.
The only theoretical way I think it would work would be if you knew all the method names ahead of time, gave them all an index, and gave each class a v-table large enough to contain them all. In a Rails application I estimate this might be about 5,000 entries - so 40KB per class. And with singleton classes lots of objects have their own class, so 40KB per object! And as I said that doesn't work anyway as the method names are not known ahead of time - they can be dynamically created and the set of method names is infinite.
But that was method lookup, and that isn't really our problem. I like to use this example from a real Ruby gem to illustrate the bigger problem.
def clamp(min, max, value)
[min, max, value].sort[1]
end
That clamps a value between a min and a max. Using v-tables we could look up the calls to #sort and #[] relatively quickly (well we can't because of the problems with v-tables described above, but I'll keep going) but even with a fast lookup this is still terribly slow code. It creates an array, it sorts it, creating another new array, and then indexes it.
To make that code fast we don't just need fast lookup, we need to take the logic from the sort routine and inline it into this method, and we need to remove the loop in sort and specialise it for just three entries, then inline the methods that calls, such as #<=> to compare the values, then specialise the sort code for the fact that we only wanted the middle entry and remove the code for sorting the other entries, and then we need to remove the allocation of the two arrays.
We need to be able to automatically turn that Ruby code in this.
def clamp(min, max, value)
# deoptimise if things aren't as expected
if value < min
min
elsif value > max
max
else
value
end
end
(Pretend that the calls to #> and #< are simple operators like in C). There's not even any lookup there any longer - so lookup was never really part of the problem.
And crucially, we need to be able to reverse all that inlining if someone monkey patches any of those methods. V-tables don't help us do any of that, but deoptimisation allows us to be in the middle of executing code like that, and reverse the inlining and the specialisation, allocate the objects we removed the allocation of, and keep going after a monkey patch.
The implementation of Ruby I work on, JRuby+Truffle, does that kind of optimisation for real.
Ah. Okay. So the problem is that if a class is monkeypatched, all prior instances of that class must also be monkeypatched (given how Ruby works, there's no point otherwise), and so all code that was inlined now has to be uninlined, so the new, monkeypatched code can run. Now I actually get it.
The point I was discussing is actually irrelevant to vtables, it would work just as well with any other method dispatch mechanism, but you probably already knew that, and it has nothing to do with the actual problem.
Out of curiosity, how does Squeak handle this? Or does it just not inline?
Call/cc is actually used by Sinatra for its error handling. There's not a good way to get its syntax without doing it that way.
(Is the syntax worth using call/cc? Eh, good question. But Sinatra's killer feature is, literally, its syntax, so I don't think I get to tell them "don't do it that way." Rails has an ugly workaround for not using call/cc that bites many, many app programmers who forget to return after calling render().)
My only concern would be what kind of impact this has on memory usage. Having all of those in memory copies of method implementations hanging around probably comes at a cost. But in general the world seems to be OK with trading memory for speed these days. If Ruby is twice as fast for less than twice the memory then we can easily consider it a win.
Anecdotally, it seems like servers these days have plenty of RAM to spare relative to CPUs anyway. A semi-recent Rails app that ran on AWS EC2 m4.xlarge's with 24 Puma workers only used 5.5GB of the available 16GB of RAM.
It also looks like the spirit of this optimization comes from the work Chris Seaton and others have been doing with Ruby and the Graal VM. It's great to see work like that having an impact on MRI.