My only concern would be what kind of impact this has on memory usage. Having all of those in memory copies of method implementations hanging around probably comes at a cost. But in general the world seems to be OK with trading memory for speed these days. If Ruby is twice as fast for less than twice the memory then we can easily consider it a win.
Anecdotally, it seems like servers these days have plenty of RAM to spare relative to CPUs anyway. A semi-recent Rails app that ran on AWS EC2 m4.xlarge's with 24 Puma workers only used 5.5GB of the available 16GB of RAM.
It also looks like the spirit of this optimization comes from the work Chris Seaton and others have been doing with Ruby and the Graal VM. It's great to see work like that having an impact on MRI.
If you want Rails-specific benchmarks for Ruby commits I suggest the `discourse_` benchmarks that are running specific chunks of the open-source Discourse Rails app: https://rubybench.org/ruby/ruby/commits?result_type=discours...
I really appreciate this view for the rails commits:
"In Ruby..." [we could have]... " deoptimization. It allows us to reverse the just-in-time compilation process and go back to a simpler interpreter where all the checks Ruby needs are explicit, all the objects really exist, and we have code paths to handle anything."
It seems like Ruby has a whole lot of features that actually make optimising it very difficult (monkey patching et al), and this is their attempt to work around it.
When was the last time somebody used call/cc in ruby code, anyways? By current implementation, it's only useful for thought experiments and toys, and we have Scheme for that.
In short, deoptimization seems like a case of solving one of the sinister black tendrils of the problem, not the root cause.
Monkey patching, done correctly, will "settle" over time. That is, once your rails app has called all the dynamic finder methods it's ever going to use, and `require`d all of the `active_support` extensions it'll need, it's done monkey patching. Of course, there are programming mistakes that can open a huge bag of worms, such as using `OpenStruct`s. Generally, anything that will smash the global method cache is also going to require the kind of `iseq` deoptimization that this pull is about (in fact, smashing the global method cache is a simple case of deoptimization).
This is why deoptimization is so useful - it's almost never going to happen after your app has settled, but it's necessary to maintain correctness while `active_support` is being loaded and adding all of those yummy methods to core classes.
`call_cc` can go to hell though - I've never seen a program that uses it ;-)
As for call_cc (or to use its "real" name call/cc, or call-with-current-continuation), my usual response to those that propose dropping call/cc (usually proposing shift/reset or similar as a replacement) is "over my dead body," but that's Scheme: supporting weird experiments is part of Scheme's job, and there's at least one high-quality implementation of Scheme that uses Cheney-on-the-MTA compilation, which makes call/cc fast enough for Real Work. In ruby, the situation is different: call/cc is slow, expensive, and Ruby is a language for getting stuff done, not academic experiments about control structure.
What you call "the root cause" is literally why I use Ruby.
Call/cc, OTOH, should be dropped from Ruby. It's caused a great deal of pain for the devs, and it's not nearly as important as it is in something like Scheme. They could implement the similar (though not as cool) shift/reset, and have much of the power of call/cc, an easier implementation, and perf that's practical.
Do you have a concrete idea for how to support monkey patching without deoptimisation and have it be fast? If you do, congratulations it's a major breakthrough in language research and you should write a paper.
But I'm just thinking this is utterly nuts: Not that I'm degrading your accomplishments. If it's really that hard a problem, than the fact that you have solved it at all is impressive. But the fact that we have to go to those lengths at all is crazy.
Part of why it seems so nuts to me is probably because I don't know enough about Ruby's internals. As I understand it, A class's methods are kept in a datstructure, associating them with their identity (I think it was a hashtable at some point, although this may have changed) - A vtable, in C++ parlance. When you monkeypatch a method, you're either adding a function to the vtable, or changing which function a vtable entry points to. I don't know why you have to deoptimize to do that. past-me should have probably understood the problem better before he made claims he couldn't back about it, but there's nothing I can about it now, save apologize for being stupid, and ask for an explanation. Which is what I'm doing now.
The only theoretical way I think it would work would be if you knew all the method names ahead of time, gave them all an index, and gave each class a v-table large enough to contain them all. In a Rails application I estimate this might be about 5,000 entries - so 40KB per class. And with singleton classes lots of objects have their own class, so 40KB per object! And as I said that doesn't work anyway as the method names are not known ahead of time - they can be dynamically created and the set of method names is infinite.
But that was method lookup, and that isn't really our problem. I like to use this example from a real Ruby gem to illustrate the bigger problem.
def clamp(min, max, value)
[min, max, value].sort
To make that code fast we don't just need fast lookup, we need to take the logic from the sort routine and inline it into this method, and we need to remove the loop in sort and specialise it for just three entries, then inline the methods that calls, such as #<=> to compare the values, then specialise the sort code for the fact that we only wanted the middle entry and remove the code for sorting the other entries, and then we need to remove the allocation of the two arrays.
We need to be able to automatically turn that Ruby code in this.
def clamp(min, max, value)
# deoptimise if things aren't as expected
if value < min
elsif value > max
And crucially, we need to be able to reverse all that inlining if someone monkey patches any of those methods. V-tables don't help us do any of that, but deoptimisation allows us to be in the middle of executing code like that, and reverse the inlining and the specialisation, allocate the objects we removed the allocation of, and keep going after a monkey patch.
The implementation of Ruby I work on, JRuby+Truffle, does that kind of optimisation for real.
I wrote a thesis about all of this http://chrisseaton.com/phd/
The point I was discussing is actually irrelevant to vtables, it would work just as well with any other method dispatch mechanism, but you probably already knew that, and it has nothing to do with the actual problem.
Out of curiosity, how does Squeak handle this? Or does it just not inline?
Yeah, now that I actually get what deoptimization is, it doesn't seem as crazy.
(Is the syntax worth using call/cc? Eh, good question. But Sinatra's killer feature is, literally, its syntax, so I don't think I get to tell them "don't do it that way." Rails has an ugly workaround for not using call/cc that bites many, many app programmers who forget to return after calling render().)