
Add bytecode cache to Ruby - ksec
https://github.com/github/ruby/pull/27
======
3JPLW
What's the relationship between github/ruby and ruby/ruby? It looks like
they've diverged quite a far ways away from each other, but that might just be
an artifact of which branches GitHub uses when comparing the two.

~~~
randall
Seems like this is Github's (the company's) fork of Ruby. ruby/ruby seems like
the official ruby.

~~~
YorickPeterse
Correct, Github uses their own fork to apply changes for their own needs.

~~~
3JPLW
Do you know how much they try to push their own changes back upstream?

~~~
sams99
Yes, Aman and Koichi work very closely, the main point of difference at the
moment is the method cache patches, Koichi is working on getting something
similar implemented in MRI

~~~
ksec
Why something similar but not the same?

~~~
claudiug
That is a very good question, and I also I'm curious to find out

~~~
ksec
As if the Ruby Community has more then enough resources to improve on TWO
Compiler / Interpreter at the same time. Isn't it better to work together then
to reinvent the wheel?

Speaking of compiler, the JIT for Ruby development has been very quiet for
months.

------
haberman
I wrote a benchmark that measures the speed of various VM parsers and the
speedup that precompiling brings. I found that precompiling was a huge speed
benefit: [http://blog.reverberate.org/2014/10/the-speed-of-python-
ruby...](http://blog.reverberate.org/2014/10/the-speed-of-python-ruby-and-
luas.html)

------
daurnimator
Interesting...

The Lua community has found that bytecode is actually _slower_ to load than it
is to generate from source: The extra latency of loading the (larger) bytecode
from disk/ssd/flash, exceeds the cpu time to lex/parse.

~~~
ploxiln
On the other hand, Lua syntax is much simpler

(slightly out of date: [http://programmingisterrible.com/post/42432568185/how-
to-par...](http://programmingisterrible.com/post/42432568185/how-to-parse-
ruby))

~~~
daurnimator
> slightly out of date:
> [http://programmingisterrible.com/post/42432568185/how-to-
> par...](http://programmingisterrible.com/post/42432568185/how-to-parse-ruby)

Ouch.... Perhaps a lesson in making your language's grammar too complex: if
you do, you'll eventually have to pre-compile.

~~~
TheLoneWolfling
So has Ruby joined the ranks of languages that formally cannot be parsed due
to the halting problem?

I know Perl is in that category.

~~~
vidarh
I don't think so. There are a few things that looks distinctly iffy in that
respect on the surface, but they are resolved.

e.g. is "foo" a method call or an instance variable? You can't know in
isolation, but it doesn't matter at parse time, as if it's part of a larger
construct that is only valid as a method call, such as if there's an argument
list after "foo", it is parsed as a method call. E.g:

    
    
        foo = 1
        foo(42)
    

will result in:

    
    
        test.rb:4:in `<main>': undefined method `foo' for main:Object (NoMethodError)
    

I think all of the potential cases that might have otherwise made Ruby
impossible to formally parse are resolved in similar ways.

Now, there are certainly layering violations. The aforementioned example of
"foo" by itself can only be resolved by determining whether or not "foo" is in
scope as a local variable at the point it is referenced, for example, but you
can opt to defer the decision until after parsing.

------
Arnor
How significant is the performance impact on a mid-large Ruby on Rails
application?

~~~
fizx
It's mostly annoying in development when your mid-large Rails app takes 30s-2m
to start-up and/or reload after an edit.

~~~
vidarh
Most of the time that is likely due to bundler/rubygems stuffing your load
path full and causing thousands of unnecessary stat calls.

The actual time spent loading/parsing files is in most cases a tiny fraction
of the startup time of any large project using rubygems and bundler.

I counted several hundred thousand unnecessary stat calls on the biggest app I
have, and ended up with a ugly hack where we trimmed the load path around each
set of require's to only paths needed by that specific gem.

~~~
foz
Or, as I've seen a few times, you have circular references in your rails asset
includes.

------
JulianWasTaken
Hah. .pyc files are one of the worst part of Python for developers.

export PYTHONDONTWRITEBYTECODE=true is the first thing anyone should be doing.

I guess it figures that we copy each others' mistakes.

~~~
michaelmior
I've been coding in Python for many years and I can count on one hand the
number of times that this has actually caused me any problems.

~~~
JulianWasTaken
What is the total amount of time you spent during those few times, although
including the amount of time to learn what happened, and how many thousands of
times larger is that amount than the sum total of the time saved by caching
bytecode compilation every single time you loaded a pyc for every file you
ever wrote?

~~~
michaelmior
Probably a few minutes spent. I'm not sure how to quantify the overhead of
caching compiled bytecode, but I'm guessing it beats that.

~~~
JulianWasTaken
It doesn't, which was my point :)

The overhead is _tiny_ , less than milliseconds for sane modules. It's a
useless optimization, especially when it can be done at install time only,
say, as opposed to for every module import, but even that is somewhat silly.

~~~
michaelmior
Without seeing any numbers, it doesn't mean much to me. I'm assuming someone
much smarter than me has identified the benefit of bytecode caching and unless
it really gets in my way, I see no need to do away with it.

------
aaronbrethorst
Does anyone know, offhand (or have a good educated guess), on what the largest
monolithic Ruby codebase in existence is? Is it GitHub?

~~~
ksec
Cookpad from Japan?

[https://speakerdeck.com/a_matsuda/the-recipe-for-the-
worlds-...](https://speakerdeck.com/a_matsuda/the-recipe-for-the-worlds-
largest-rails-monolith)

50 million unique user / month 15,000 Req/Sec

I posted this awhile ago but didn't pulled much attention

[https://news.ycombinator.com/item?id=9161220](https://news.ycombinator.com/item?id=9161220)

------
claudiug
what is the advantages of adding bytecode cache?

~~~
nnq
speculating: it allows you to still be fine with a slower parser. and this
means you can theoretically make the parser smarter (like make it do some
inference and catch some bugs at parse-time?) without worrying that much about
performance. but then again, I can't imagine what kind of bugs could a parser
catch for such a dynamic language like Ruby, so they're probably just doing it
to shave some milliseconds from program-start-up time...

~~~
btown
With huge libraries (like SpreeCommerce, for instance), startup time can be
seconds or more even on recent hardware, and the vast majority of that code
isn't changing between runs. So if every gem was cached, they could see
radically faster times to load their test suites across their entire
organization. It's well worth it even without a smarter parser.

------
Mojah
This is the equivalent of APC or OpCache in PHP's world?

~~~
Freaky
PHP needs an opcache because its standard behaviour is to discard all state
after each request - including all compiled bytecode - kind of emulating the
CGI model.

Ruby web apps on the other hand tend to be run in a loop - the script never
exits after a request, it just goes back to the top of the loop to accept the
next request to serve.

e.g.
[https://github.com/rack/rack/blob/master/lib/rack/handler/fa...](https://github.com/rack/rack/blob/master/lib/rack/handler/fastcgi.rb#L27)

    
    
            FCGI.each { |request|
              serve request, app
            }
    

It's nice because it completely decouples your app's startup time from request
processing time, so you can do expensive setup stuff up-front without slowing
down each request. The disadvantage to that is there's not much pressure to
keep startup time low, since it's only happening once.

~~~
juliangregorian
Yeah man, PHP makes scaling easy, it's shared nothing by default, do you even
web scale bro? /s

