Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How I Made My Ruby Project 10x Faster (adit.io)
97 points by jjhageman on Aug 19, 2013 | hide | past | favorite | 57 comments


Is it wrong that I was secretly hoping the answer was "I rewrote it in something that isn't ruby"?


At this point on HN, I was expecting a page that just said "I rewrote it in Go"


Or Erlang/Elixir.


Never any love for Haskell... which is ironic considering his ruby library is essentially a (weak) runtime version of Haskell's type signatures.


Damn, busted. Elixir was the first thing that popped into my head on seeing this title. It looks almost like a Ruby port to the Erlang VM.

That said, good writeup, even for a non-Rubyist, enjoyed seeing the process.

One thing that stuck out is how control flow via if/then/else is faster than via case (for more than 2 options) in Ruby. I always assumed the opposite, for any language.


There's an older (but still interesting) post about case in Ruby here[1] and another here[2] with some extra insight into the MRI implementation. Having learned Ruby before most of the other stuff I play with these days, I always second-guess myself when using a switch/case construct in my code.

[1] http://www.skorks.com/2009/08/how-a-ruby-case-statement-work... [2] http://www.daniellesucher.com/2013/07/ruby-case-versus-if/


Nice writeups, thanks!


Ruby is awesome, so yes that would be wrong. But is it wrong that I was secretly hoping the answer was to abandon the project?

Contracts are not needed and it obfuscates the error raising. Just use plain Ruby if you need to raise an exception. I'm will seriously quit using Ruby if someone makes me use contracts.

But it looks like a fun project to write. Nothing wrong with that.


Yeah, it sounds like he should just use a statically typed language. (yes, I am aware that types != contracts, but they're similar enough that my statement still stands)


Wrong? Sometimes. Trendy? All of the time.


"how I made my toyota tercel 5x faster... (I sold it and bought a porsche)"


alias_method is faster than what he did before because he's replicating a bunch of internal logic at run-time, e.g., binding "self" to the correct thing. Somewhere deep in the bowels of the Ruby interpreter it's going to be doing the same thing, except it has the opportunity to create the binding statically at define-time. Likewise, he's calling a method on a Method object to invoke the original method vs. invoking the original method directly.

This might speed things up a little more, too:

  alias_method :"original_#{name}", name
  class_eval %{
    def #{name}(*args)
      self.original_#{name}(*args)
    end
  }
"send" is going to be using extra machinery in the same way, although I think "send" also circumvents certain protections like public/private, so I'm not 100% sure where in the Ruby method invocation process it fits in. "send" might be closer to the metal, but a pre-defined method name would have the opportunity to cache certain things in the invocation process.


All of the performance improvements combined were about 0.00002s per call. They came at the cost of simplicity.


This was my thought as well. This is one of the main fallacies of code optimization; unless these functions are called a LOT, you very well might be optimizing something that is not your bottleneck. You can see the author is missing this point because he talks about going from "20 seconds to 1.5 seconds." He is using those numbers as if they are real use case times, when in fact they are generated from an arbitrary amount of calls to a function.

So many times I will see people make code optimizations of this type to shave fractions of a millisecond from a function call at the cost of complexity, memory usage, etc. Meanwhile, most of the time their program is actually waiting on IO or something of the like.

Blind optimization is counter productive.


You don't know what project he was using that gem for. It's very well possible that he did some tests and found that this contracts library was the bottleneck. If that were the case, then the approach makes sense.

Although, given the nature of the library, I highly doubt that was the bottleneck


Yes, it is possible. If this was the case, however, it would have made much more sense to include this information in the blog post. I think people would have gotten more use out of someone explaining the process of finding the bottleneck and then optimizing.


Why do you think this is blind optimization? When do you think an optimization like this would be appropriate? Can you give an example of a performance optimization you have made recently?


> Why do you think this is blind optimization?

Because it is. It's a clear case of premature optimization. Especially since nothing was offered up to to suggest otherwise.

> When do you think an optimization like this would be appropriate?

When it's being reported back that it's one of the top bottlenecks. Before that, it's mostly a educational trip. It's great to learn how to write optimized code, but, frankly, removing empty function calls isn't an optimization.

> Can you give an example of a performance optimization you have made recently?

Eliminating the need for excessive memcache calls by not only changing the objects being called, but also by eliminating the need to make the calls in the first place by intelligently using existing locally cached… cache. If that makes sense. Anyways, that improved performance by ~20ms. It wasn't the slowest call running, but it was the slowest call running that could be optimized at this point.


Arguably, a lot of them came through improvements that did nothing to harm simplicity.

Removing the unnecessary callback, for example.

And ".times" is certainly more idiomatic than ".zip" for his case. It's a toss up between .each_with_index and .times, but really both will be readable enough. And from experience, you'll find Ruby devs that are unfamiliar with "zip", but not with "each_with_index" or "times".

"alias_method" is also far more idiomatic than grabbing a reference to the method - I don't think I've ever actually seem method() used in production code (not that I've explicitly looked)


In most cases I agree with you. In this case though, the speed penalty could have been an argument not to use contracts.ruby...and I wanted to avoid that :)


A thought for you: Testing with respond_to etc. is great. But there's a special place in hell for you if you get people to start testing against specific classes.

It's almost always a horrible anti-pattern in Ruby to do "kind_of?" tests or similar, unless you depend on a huge portion of the interface of said class. But almost always code that does tests like that really only depends on a method or two, and doing class based tests makes the code vastly less reusable.

I don't want to dissuade you from this project. There are useful things you can do with contract specifications in Ruby that'd be interesting to explore. But I do hope to dissuade you from having all those class based tests in your tutorial for example...


Runtime tests always cause a performance penalty. (And they do not prevent you from writing buggy programs! They just make the runtime environment yell at you if a bug is found.)

If you want to guarantee that a property holds for all the elements of a set (every possible running instance of a program), forget about the elements and work with the set's definition (the program itself).


Static analysis is really hard in general though, and in complex and highly dynamic languages like Ruby its even harder. Some things you really are better off using runtime checks for...

And in the end, smaller overhead on the contract checker means that the programmer has more room to add extra tests and less reason to turn off the checks in production.


If you are developing a system meant to be stable and run for an indefinite period of time, static analysis imposes less computational overhead. You pay the price of static analysis only once. You pay the price of runtime checks every time you use the checked functionality. Even worse, if a check in the middle of an ongoing transaction fails, you must accept the cost of rolling back the whole transaction. From a purely computational point of view, static analysis is less expensive.

(Of course, this analysis of the situation changes drastically if you take into consideration the overhead on the poor programmers' brains. Static analysis is intellectually more demanding.)


Static analysis of any sizeable Ruby program is at best an unsolved problem, at worst impossible, as a lot of Ruby libraries depend on eval() and tracing down the sources of data for the eval string to verify that it isn't potentially user supplied (and often it can be)is hard enough, and the moment the data may be user supplied, you've "lost": Pretty much anything you thought you knew up to the eval() call can be wrong after the eval() call.

If you need/want static analysis, the option for the foreseeable future is to use another language (and I say this as a huge Ruby fan that also would love to see an evolution of Ruby that'd make some/more static analysis feasible, but for that to happen I believe you need an effort to provide a module encapsulating common eval() behaviour and a huge effort to replace eval in common Ruby libraries - it's not likely to happen anytime soon).


Do you mean something like laser?: https://github.com/michaeledgar/laser


LASER's homepage ( http://carboni.ca/projects/p/laser ) for some reason does not load in my browser. http://downforeveryoneorjustme.com/http://carboni.ca/project...

Judging the project from its feature list in the README.md, it does not seem like LASER is capable of handling arbitrary invariants.


Great point. Simplicity is one of the main reasons to use Ruby, not performance.


Ease of use. Which is not the same thing as simplicity.


Well, code can be organized in a simple way where the constructs of the language do not get in the way of conceptual clarity.

This may or may not translate to ease of use depending on the user.


My understanding of simplicity is that it can be nailed down to a few universal principles:

1. Facilitating local reasoning: I do not want to read the whole program to understand what a single line of code may or may not do.

2. Compartmentalizing concepts: There should be a single, easily identifiable feature for each of: defining classes of data, separating concerns between different code units, enabling code reuse.

3. Keeping abstraction leaks to a minimum: Abstraction leaks force corner cases to be handled outside of the abstraction (that is, low-level programming). So abstractions should ideally not be leaky, and leaks should only be tolerated if 1. there is no nonleaky (or less leaky) alternative, 2. the benefits of the abstraction outweigh the cost of handling the leak.

Simplicity leads to conceptual clarity.

On the other hand, ease of use is more about the pragmatic desire to invest the least amount of effort possible in making a working program. At its most short-sighted extreme, the easiest thing to do is simply following the path of least resistance. A more refined version is following design patterns that experience tells work okay.

Simplicity and ease of use are most certainly positively correlated, but it is perfectly possible for a horribly contrived construct to be easy to use, if its complex internal details happen to match nicely the problem domain one is working on.

My heuristic take (admittedly not justified on anything beyond aesthetical preference) is that simplicity is the more fundamental concept, and ease of use is just a corollary. And my impression of Ruby is that it seeks to maximize ease of use, resorting to non-simple means if necessary.


Indeed, it would be simpler if there was only "if" and no "unless", or multiple names for what are the same method (Array#size and Array#length), but (often) they're there to make things easier to read and write.


A more serious offender is the .to_s method: Why on earth is nil.to_is == "", but [nil].to_s == "[nil]"?


What would you expect [nil].to_s to be? Seems like a reasonable representation to me. I certainly wouldn't want "['']", and I also wouldn't want nil.to_s to give me "nil". In practice having nil evaluate to an empty string is quite useful.


I think nil.to_s == "nil" is pretty reasonable. At least, it gives .to_s an elegant definition that does not need to case-analyze the nil-ness of the elements of an array.

In any case, this just reinforces my point that Ruby emphasizes ease of use, not simplicity.


That may be so, but sometimes, once your design has settled, you'll have hotspots you want to speed up.

Of course I hope nobody starts jumping through all kinds of hoops before they know that it matters...


Sort of takes away from the "power" and "beauty" of the language when you need to dissect your code and understand the runtime to get reasonable performance.


I can't believe this is the top comment at the moment.

First, let's get something straight. This project wasn't necessary, because you shouldn't be type checking in Ruby if it is written correctly. That's basically what contracts are, at least in this case.

To address your concern, Ruby's performance is plenty fast- like Java entering the early 2000's fast. (You probably don't get my reference, but in the mid-1990's, people complained Java was slow.) No one at work has complained about the performance of our Rails app running on Unicorn, and we've not had to do one performance tweaking iteration ever.


I can't believe this is not a top comment anymore.

I don't have enough experience with Ruby to comment about it, but as far as Java is concerned, people (myself included) are still complaining it is slooooooooo... <let me pause a bit to collect garbage>... ooow. Don't get me wrong, Java is an excellent language which has plenty of other benefits (as I guess Ruby does too), but raw performance is not one of them.


I do get your reference. I was a java developer since 1996. 2000 was 13 years ago and ruby is far older than Java was then, so hardly a valid comparison.

I am a ruby developer and ruby is dog slow.


Almost the same # years experience here in Java, then in Ruby for the last several.

Ruby is NOT dog slow compared in duration of time per activity on the (virtual) hardware we have now than Java was in the early 2000s on the hardware in those days.

Java is slower than writing machine code, but that doesn't mean I'm going to start coding in machine code.

If Ruby is dog slow for you, you are probably trying to run JRuby in development (vs. on server) which is dog slow (though on server after startup and compilation it is pretty darn fast), or you're starting Rails up everytime you need to do anything (vs. the many ways around that).

If you've read JRuby is faster- they are talking about the server. They didn't mean locally, recompiling all of the time.

It's true Ruby is older, but Ruby is usable today. I'm sorry that you had a bad experience but please don't spread FUD.


tl;dr: bolting on a dynamic-typed imitation of static-typing is slow.


I'd go for:

Avoid creating objects that aren't needed, and avoid method calls.

For the most part that's what his changes boils down to. In other words: Reduce complexity. It's very easy to cause Ruby code to blow up performance wise because you don't realize how many extra objects you create (arrays and hashes + lots of chaining or functional inspired method that returns copies instead of mutates == boom) or how many extra method calls you trigger behind the scenes.


More generally, avoid duplicating effort: memoize as much as possible, including branching over invariants.


Contracts are not types and do not really behave like types in many ways.


They behave enough like types that the library in question is pretty much as far from idiomatic Ruby you get if you use it like in his examples and tutorial.

Constraining inputs to a specific class, for example is a huge, giant no-no. If you're going to make Ruby statically typed, then you're better off picking a language that's easier to make fast.

This is a pet peeve of mine as I've more than once had to work around arbitrary, stupid checks against specific classes in cases where meeting the "actual" contract of the method called would be down to implementing a simple method or two.

The tutorial hints at some possibly Ruby-ish possibilities, like contracts based on whether an object responds to a method, but all the focus seems to be on class comparisons.

I do like the idea of specifying contracts, fwiw, as long as they're focused on behaviour and not on matching specific classes.

I don't, however like the idea of specifying them in the class files. Specifying them in separate files, that could be loaded when wanted, and that could also be used by a testing app to do some degree of automated testing of limits etc., or by other apps to do limited static analysis would be interesting.


And their core, types and contracts are the same thing: logical propositions asserting properties of a program that have to be enforced somehow. The difference is the method of enforcement: types are enforced via static analysis and contracts are enforced via runtime checks.

Oh, and if what you meant was "types cannot check pre/postconditions", then you should have a look at dependently typed languages. Some dependently typed systems can express any proposition expressive in first-order predicate logic. (The downside is that satisfying the type checker becomes bloody hard.)


This is not correct.

Thank you for seeking to educate me about dependent typing, but I have a Masters in PL theory, have done numerous proofs about PL and type theory in Coq, and did my Masters' thesis on contract systems, under the person who invented contracts in higher-order languages. I do, in fact, know what I'm talking about. :) Unfortunately, it is quite a long explanation and needs a full blog post to cover in detail.

A few notes, though: contracts can feature blame, which types cannot, and types are proofs and contracts aren't (which can be both a plus and a minus depending on your situation).


Another PL theory person checking in...

- The word 'contract' has different meanings depending on which subcommunity you are talking to. For example, the Penn crowd often uses the word 'contract' in relation to refinement typing. The Scheme folks tend to think of the problem in different terms, but I am not convinced there is any substantial or insightful difference between the various definitions. As you seem to be an expert in the subject, I am sure you have read the manifest contracts paper, but I will leave this link here for the others: http://www.cis.upenn.edu/~mgree/papers/popl2010_contracts.pd....

- I have never heard anyone say that types cannot feature blame. I would argue that, for example, Haskell's type checker is quite good at blaming the correct party for a type violation. To argue the opposite you would need a precise definition of blame, and I am aware there are several papers written on this subject. I am sure you could find a definition of blame that excludes static type systems, but I am not sure that definition would be useful.

- Types are not proofs, but typing derivations are. Even if you consider types and contracts to be different, the execution trace serves as a proof of a dynamically checked 'contract' (which is really just a derivation in an operational semantics).


Types are not proofs, but typing derivations are.

This is definitely true, thanks for the correction.

I have never heard anyone say that types cannot feature blame. I would argue that, for example, Haskell's type checker is quite good at blaming the correct party for a type violation.

I agree this is dependent on how you construct blame. In the sense I was talking about, it would be difficult to construct a type system which blamed in the same way -- the type system would essentially require a runtime checker for independent pieces of the program crossing the contract boundary, which I have seen constructions of, but is generally not seen as standard.

For background, my contract work was in Racket, so I am of the Scheme school :)


dang, ninja'd me


Types themselves are not proofs. In particular, the proposition associated to a type might be a blatant lie - you simply should be unable to find a term for that type. Also, the notion of blame in a contract maps directly to the type checker telling you which term does not type check.


I really enjoy write-ups like these. Anyone have a resource that is primarily blog posts/articles like this?


The rest of his block is pretty great -- http://www.adit.io/index.html

Some of them have pretty cool original illustrations, too!


Does anyone know what's going on behind the scenes with `alias_method`? I'm curious to know how it's implemented such that it was actually faster!


Would have been more useful if there were more before-after examples.


Did you rewrite it in Go? I guess I should read the article.

OK, now that I've read it, profiling ftw :-)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: