
So You Want To Optimize Ruby - jamesbritt
http://blog.headius.com/2012/10/so-you-want-to-optimize-ruby.html
======
gioele
Couldn't we get rid of some "features" that have little practical use and
create problems for implementers and make optimisation harder?

For example `$~` and `$_` variables. Yes, they are lazily loaded but they
force the VM implementers to jump through so many hops in order to keep track
of them.

> In the case of $~ and $_, the challenge is not in representing accesses of
> them directly but in handling implicit reads and writes of them that cross
> call boundaries. For example, calling [] on a String and passing a Regexp
> will cause the caller's frame-local $~ (and related values) to be updated to
> the MatchData for the pattern match that happens inside [].

~~~
headius
Indeed we could, and as a reply mentions some implementations do take
liberties with various Ruby features like $~ and $_ (in MacRuby they are
thread-local, which sorta works but is often broken behavior).

Part of my challenge as a Ruby implementer is to figure out which features
really must be implemented to the letter of the law and which ones can be
fudged without impacting anyone. ObjectSpace was an example that could be
fudged; it turned out nobody ever used it for anything other than iterating
over all classes in the system. I also often make my case to the MRI/ruby-core
folks and Matz himself as to features that are an unreasonable performance hit
for not enough benefit. Sometimes I win those arguments, sometimes I lose
them.

------
codewright
Warning! I made this comment because I thought this was an amazing and
informative blog post, but nobody said anything in the last 7 hours it's been
posted... :\

As a result, I've decided to toss out some thoughts I've had about Ruby,
programming languages, PL implementations, and Lisp lately.

This is a fairly interesting, dense, but accessible list of aspects of Ruby
that make it hard to optimize.

An interesting counter-example in terms of relative expressive power vs.
performance would be Common Lisp.

The main differentiator seems to be how _accessible_ the expressive power is.
Ruby just really doesn't demand much forethought or precision on the part of
programmer in terms of what should be in scope, what data is getting passed
around, how to even represent the data, let alone in memory.

That's saying a lot, given that the default data structure in 'classical' lisp
is a linked-list, which is a pathologically poor choice for today's hardware
for cache predictability. Thus why Clojure has emphasizes the generic
sequencing protocols/API and tries to nudge people towards using vectors for
everything.

This all adds up to a language that is "easy" and expressive, but not simple.

It leaves one to wonder why we haven't gotten much further in terms of raw
expression and concision at a high and low level than Python, Ruby, and Common
Lisp. There are others that deserve mention for being interesting and powerful
(Haskell, Clojure, Io) but for various reasons/limitations/trade-offs aren't
necessarily more concise/more powerful in the small (trivial example) or in
the large (managing state, scope, composing code from afar).

Clojure and Haskell pick some smarter defaults for the scale-up to larger
codebases, but don't do much to offset the loss in expressive power those
trade-offs incur. I think the emphatic focus on immutability and keeping as
much immutable as possible as is the case with Haskell and Clojure is the way
forward, but only Haskell has really made any attempt to explore how to
advance the power of expression in that mode.

Idiomatic Clojure code barely makes use of monads as it is.

To understand the scorn that Common Lispers have (I used to code primarily in
CL) for Clojure, Haskell et al, it's important to remember that they
principally care about what's _possible_ , not about what's idiomatic,
standard to the community/libraries, etc.

It has long been the case that you could write predominantly functional,
immutable, expression-based code in Common Lisp. Many did. What Clojure did
was formalize it, clean it up, etc.

Clojure didn't bring anything new to the table in terms of power. Not like
Haskell's type system. I spoke to David Nolen recently about predicate
dispatch and it seems as if progress on that front has even stalled.

The delay in progress on predicate dispatch is disappointing because I was
going to use the predicate dispatch work as the "selling point" for Clojure
for a few CL'ers I know. It would've been a genuinely distinguishing point as
while there has been tons of work on predicate dispatch in Common Lisp,
Clojure's core.logic foundation was shaping up to have a better API and be
more composable.

Instead I'm left with nothing I'm able to recommend with a straight-face to an
experienced Common Lisper who isn't interested in JVM integration.

So what's new? What do you really _win_ with Clojure other than better
libraries and community? Not much.

Oh and the uh...concurrency primitives? They're a canard. They're neat,
they're designed excellently (being that it's Rich Hickey), but they're just
not as impactful as other things are these days.

If single-node/instance performance was important, you wouldn't be able to do
it on the JVM. If scalability is important, shared memory on a single node is
irrelevant and it becomes a matter of implementing a partition tolerant
database.

The concurrency primitives would've been more important in an alternate
universe where we didn't use databases/data stores/filesystems for state
management and synchronization and instead did Smalltalk stateful servers.

That didn't happen and it was an extremely bad idea and we live in a world
where java.util.concurrent exists.

So the CL'er who codes like a bi-polar hermit doesn't care about Clojure.

You want to win the Lisp-community-at-large? Give them more power. It doesn't
have to be the trivialized imprecision of expression that comes with something
like Ruby. It doesn't have to be the minimalism of Io and Scheme. It doesn't
have to be native continuations.

It doesn't even have to be Shen or Haskell-style static typing.

What it does have to be...is more powerful for the lone cowboy. It's going to
be hard to convince a lot of them without really hitting a home-run on that
one metric. Especially in a post-CLOS/Art of the Metaobject Protocol world.

I think Rich Hickey knew from the beginning he wasn't going to win a huge
chunk of the CL'ers over. I don't think it mattered. I think what mattered is
that people stuck in the Java enterprise universe could have their sanity-
preserving escape hatch.

It's a mistake to believe Clojure was designed for CLers. It was designed to
save Python, Ruby, and Java people.

A similar case of mis-targeting is Go. Go itself was ostensibly a "systems"
language when it launched. They got called out on the absurdly inaccurate
labelling so they've since repositioned it as a general purpose language.
Which is accurate.

They thought they were going to compete with C++. That's nearly impossible
until someone comes up with a true successor to C and C++. Anything that
doesn't enable absolute control over memory and semantics will never succeed
them. There always has to be the "bottom" where you can exert near-dictatorial
control over the behavior of the computer. Many industries need this.

Go was never designed to enable that, it was designed to make writing simple
code simpler. That's all. And it's good at that. I was surprised when people
with the cultural wisdom and background such as Rob Pike were surprised that
they were getting _Python_ and _Ruby_ people instead of C++ people.

The raw contempt Go programmers have for debugging and interactive programming
is appalling. That alone turned me off completely to the community. Partly
because the only reason for that attitude is some off-hand statements Ken
Thompson made in Coders At Work and elsewhere. Cargo-cult at its most
definitive.

Why would a C++ programmer move to Go? They lose control of the memory, for
which they have many layers of abstraction they can resort to (malloc/free,
new/delete RAII, unique/shared pointers, Boehm's GC which is basically
equivalent to pre-Go-1.1 GC, etc.)

They also lose the benefit of their years of investment and learning in C++
which offers them a dizzying depth of choices in abstraction layers. This has
obvious disadvantages. C++ is a terror to attempt to write safe and stable
code in.

It is however, extremely powerful and the "pay only for what you use" is
_essential_ for systems programming. My personal _preference_ is C since I
feel like I'm not aiming a bazooka at my face, but as a Common Lisper I
understand the _appeal_ of the power they feel they get from C++.

Even if the 'power' in C++ is of a different sort than that which you get from
Common Lisp.

People unwilling to recognize and understand the power, advantages,
disadvantages, and design trade-offs made by "scary" languages like C++ and
Common Lisp will never be able to fully learn the lessons borne.

So no, I'm not surprised Go didn't attract primarily C++ programmers. Not in
the slightest.

It's mostly been Java, Python, and Ruby programmers...for various different
reasons. Like Clojure.

Most of the programming languages of the last 5-10 years have been
"sidegrades". Enhancements of accessibility, of cultural idioms, of extracting
a different emphasis on a subset of semantics that previous languages
supported.

Have we really advanced beyond Common Lisp, Prolog, Smalltalk, and C++?

What does it mean to make a language more powerful than those? Is there a
limit to expressive power? What do languages like REBOL say about "normal"
languages? What about APL?

On a parting note, the only language that I've felt has truly expanded my
horizons and impressed me in the last ~5 years has been Haskell (although it's
much older than that). That said, I feel Haskell is a spiky, uncomfortable
urchin of a language that is highly optimized to a local maxima of expressing
problems in code. I learned a lot from Haskell, but I'll never be more
productive in it than Python or Clojure or Go or anything else.

Too awkward, if enlightening.

~~~
willvarfar
I, too, came here to comment just because I thought an excellent article
wasn't getting much discussion :(

I think the concurrency - both Clojure and Go trying to fix it - is a big
deal. I live in the world of the partition-tolerant databases and multiple
nodes and all and its a super-crap horrid world where everything seems
permanently fundamentally broken. You can go a long way on a single node if
you just have good clean design. I pay a lot of attention to new stuff like
Spanner but right now all our choices are flaky. Related blog posts of mine
[http://williamedwardscoder.tumblr.com/post/18065079081/cogs-...](http://williamedwardscoder.tumblr.com/post/18065079081/cogs-
bad)
[http://williamedwardscoder.tumblr.com/post/16399069781/googl...](http://williamedwardscoder.tumblr.com/post/16399069781/google-
moresql-is-real) etc

Regards your point that Rob & co couldn't correctly define "system language";
well, surely their definition ought to trump the interweb? If they consider
writing servers 'systems programming', then I'd say it was ;) I mean, they
didn't make the claim in a vacuum; its presumably what Google internally calls
a 'systems language'?

That C/C++ people don't flock to Go is a shame, because C/C++ is so
unnecessary for so many projects. Its really rare to need C, and never
necessary to need C++ (and I've worked on C++ kernels back in the day).

There, hopefully my comment is hyperbolic enough for someone to take issue
with a point and discuss ;) I do stand by my general thrust though

~~~
codewright
C/C++ are necessary for a lot of things, things that Go isn't capable of doing
or wouldn't be very good at doing.

Go isn't a systems programming language. They called it that because
_compared_ to a language like Python it's closer to the hardware.

A systems language has various properties, like strict memory layout and
management control, varying degrees control over code output, being able to
bootstrap an OS without depending on a wrapper OS / bootstrap VM / extensive
stdlib, etc. Go fails to meet any realistic definition of a systems language.

Go is instead a compiled alterative to VM languages like Java and Python.

Black is not white just because Rob Pike said so. He's a brilliant man but
he's always been a bit off to the side in terms of not really being in 'sync'
with everybody else. This is just an uncommonly concrete and inarguable
example of that.

He himself backed off the 'systems' descriptor when he realized his error.

You're fighting a bogeyman that doesn't exist.

>because C/C++ is so unnecessary for so many projects

People don't really default to C/C++ for anything they shouldn't anymore. They
live way higher up the stack by default by _FAR_.

C is generally only used where appropriate (kernel, embedded, drivers,
services like Redis, etc.)

C++ is the province of Microsofties, game developers, database coders, browser
implementors, and OKCupid employees. (Mild exaggeration)

I mean seriously.

When was the last time you heard of somebody doing a GUI desktop app or web
app in C or C++?

Dropbox is Python, most Ubuntu/Gnome/GTK apps are Mono/C#, most web apps are
Java/PHP/Python/Ruby, frontend web code is inescapably JavaScript with their
ongoing attempts to shoehorn Node.js onto the server-side. Some weirdos and
Perl6 hopefuls even use Perl for their primary language these days.

Node.js is as good an example of the "sidegrades" I mentioned in my original
comment, except it's spectacularly awful in so many ways that I have to wonder
if it's trying to draw even more uninformed and informed detractors than C++.

So who exactly is using C for everything on the planet? It's a strawman.
Nobody does that anymore. People have become comfortable with resorting to
higher levels of abstraction as appropriate.

Mostly because of Perl, PHP, Java, Python, and Ruby.

In the end, performance and control either matter or they don't.

If they do matter, you need that last-mile escape hatch that will never-fail.

If they don't, pick something "fast enough" that makes sense for your personal
tastes that suits the problem and have a ball.

There will always and forever be a place for languages _like_ Fortran, Ada, C,
and C++ even if they themselves might not survive the next century.

~~~
willvarfar
all good points, well made.

I find it disappointing that we've come to label systems languages as the
languages with the properties to implement the 'systems' of yesteryear instead
of the systems of tomorrow; Singularity, for example. One wonders if Pike and
co were to build a new OS just where they'd put the managed/unmanaged line.

Wikipedia says <http://en.wikipedia.org/wiki/System_programming_language> "The
distinction between languages for system programming and applications
programming became blurred with widespread popularity of C and Pascal." and
I'd hazard that Pike was smearing that blur wider by saying Go. The Go FAQ
<http://golang.org/doc/go_faq.html> still starts by saying "No major systems
language has emerged in over a decade" and implicitly saying Go is to be it.

Now to my point that C/C++ is usually a bad choice. Most C++ projects I've
seen have been misguided, legacy choices from the late 90s when C++ was the
rage. Then again, I've always been mystified by the appeal of STL, since it
only just got a hashtable. I have always ended up building higher-performance
custom collections for stuff and found STL with its iterator-instead-of-ranges
clunky.

I've written buckets of C/C++ code for OS and multimedia projects, some fairly
recently; but even then, I'd rather have written only the innermost mechanics
of them in C and lashed them together in something that was nice to work with,
good C interoperability (no I don't think python tooling is nice) and didn't
throw completely out all performance concerns. And until Go came along, there
really wasn't anything. Python and so on are only useful for coordinating
stuff when the task is embarrassingly and massively parallel.

Along comes Go; you can still x-ray some Go code and have a pretty shrewd idea
what the resulting machine code looks like, just like C/C++. Its within
spitting distance of C/C++ code for performance and will improve. And as the
concurrent scheduling is all in the hands of wizards, one expects it to scale
multi-core rather better than something working at the lib-level that requires
more discipline to use.

Having said that, I'm actually using Java for a current no-compromise-
performance rewrite-from-Python project now. Its Go but its tried and tested.
Still, I am not enjoying myself too much.

FWIW, had a fun bug in production the other day; turns out I must be the
biggest user of Python in a sense, since I discovered - via a crash - that
multiprocessing is using `select()` internally and will fail if it it gets a
descriptor over 1024.. how come nobody else has discovered this? I see only
one person has, and the fix is in ... 3.3. Just a fun story.

~~~
codewright
I'm sorry to be this blunt, but it's quite apparent that you haven't done any
serious systems programming. That you were surprised by the syscall underlying
multiprocessing...says a lot although I could point to other indications.

Please don't make assertions as to the necessities of a specialty you haven't
delved in, it makes it very difficult for me to be civil and respectful.

There is...no real comparison between C/C++ and Java for last-mile performance
at the bleeding edge, even if Java is a VERY good high-performance replacement
for Python for stuff like network services.

There are a variety of other use-cases where C/C++/Fortran/Ada are extremely
necessary. Also note how abysmal Java has been for Android compared to Obj-C
with its ever-present escape-hatch to raw C for making responsive and pleasant
mobile apps. Especially games.

That aside, stuff like Singularity is research-grade only for very clear and
obvious reasons. It isn't practical yet, if ever it would be.

Losing a lot of potential performance and control over the machine at the
universal OS level is pretty...damaging and limiting.

It's bad enough that an entire programming language ecosystem can put a cap on
your power over the computer and the performance/behavior thereof, to do that
at the _OS_ level except as a research project is unconscionable.

Please don't act as if Wikipedia articles about Microsoft Research projects
are some indication of the vanguard of operating systems.

> you can still x-ray some Go code and have a pretty shrewd idea what the
> resulting machine code looks like, just like C/C++.

Er, maybe in the trivial case. You won't have anything close to a
deterministic or real-time model for memory management available to you
though. And never will.

Everything Go provides on top of C in terms of extra features essentially
requires GC to function. (goroutines, and related features). The only thing it
really adds on top of C that doesn't necessitate the GC is most aspects of the
type system. That could be reused in a hypothetical systems language.

Go itself, in its current design, has absolutely zero future as a systems-
language-of-first-resort outside of tinkering. (Some people will cross-compile
Go code to chumbys and stuff.)

However, it has something more excellent available to it. A potential future
as the hero that slayed Java and supplanted it. I would rather it did that
than focused on trying to compete with C or C++. It's not suited to the task
and cannot be reconciled with that goal as a fundamental property of its
design.

Go is an excellent language for network services. Like Java. Doubly so once
they replace that awful garbage collector in 1.1

~~~
pjmlp
Maybe you should have a look at Native Oberon and BlueBottle.

They are both single user destkop operating systems implemented in GC enabled
systems programming language, Oberon and Active Oberon respectively.

The only assembly is the boot loader, GC and some low level glue for device
drivers.

Everything else is written in plain Oberon or Active Oberon, and you get a
fully working desktop environment.

This has been used at ETHZ since the late 90's.
<http://www.ocp.inf.ethz.ch/wiki/Documentation/WindowManager>

------
hawleyal
Wow. OP might as well have said, "I haven't used Ruby much, don't understand
it, but still don't like it."

~~~
ehsanu1
By "OP", are you referring to the author of the blog post, Charles Nutter? It
doesn't seem like an easy feat to not understand Ruby when you've written a
major implementation of it, namely jruby in this case.

It's also pretty hard to spend a very significant amount of your time
implementing a language you don't like. Finally, addressing his use of Ruby:
<https://github.com/headius>

~~~
hawleyal
Your logical fallacy is ad hominem or appeal to authority. It doesn't matter
who he is or what he has built.

The poster seems to favor a more complicated interface with a simpler
implementation. That is basically the opposite of what is great about ruby: a
simple interface with more complex implementation.

You want a run-down of couple points, fine. These exemplify the
misunderstanding of the goal of Ruby.

> Fixnum to Bignum promotion

The entire point of class promotion is to remove the constraints of Java, C,
et al. have to the machine architecture. And then there's the transparent and
dynamic treatment of variables. Class promotion and dynamic typing allow a
more math-like syntax and usability.

> Closures

It is entirely freeing to have blocks treated as they are. Java is no
comparison, its overly verbose syntax and lack of dynamic variable scoping is
supremely limiting. Similarly, it is a question of brevity, least surprise,
transparency, and dynamism.

