
Performance cost of Ruby ORM model instantiation  - jamesbritt
http://merbist.com/2012/02/23/quick-dive-into-ruby-orm-object-initialization/
======
mattetti
Thanks for posting my article. Quick update, someone did more benchmarks with
different versions of Ruby and adding MongoMapper:
<https://gist.github.com/1894055>

~~~
jamesbritt
Thanks for the update. I like seeing articles that cover tangible, measured
detail. It's the kind of the stuff that's often overlooked in favor of gee-
whiz syntactic sugar.

------
judofyr
And in the same way: measure how long it takes to get an answer from your DB
before you start worrying about these number.

~~~
ssmoot
It's been a long time, but I think your assumption is probably flawed.

DataMapper isn't faster by accident. This benchmark is trivial. Try loading up
1,000 objects with an actual query for each O/RM and benchmark that. Then try
a single object 1,000 times.

Instantiation is neat, but "back in the day" my findings were that you often
spent just as much time _if not more_ simply building the objects on the Ruby
side as any reasonable database query you might be executing.

In fact the cost was so prohibitive it quickly became obvious that simple
things in ASP Classic like selecting 2,000 rows and rendering them in a table
on a web-page could in many situations be so slow as to be impractical in
Ruby. Especially since the entire request is buffered (that's more or less
still true today, the flushing available in Rails isn't nearly the equal of
writing directly to a TCP socket).

Method dispatch kills.

The example of selecting a single object 1,000 times nicely illustrates this
since it stresses the "front end" of your query-interface as opposed to the
"back end" of materialization.

The price of admission in Ruby is _absurdly_ high. It doesn't take years to
build an O/RM because it's easy. ;-)

~~~
judofyr
> It's been a long time, but I think your assumption is probably flawed.

My assumption? My assumption was merely that you should measure it _before_
you start worrying about it. I'm not trying to say that isn't a problem (it
might very well be), just that you should measure first.

> In fact the cost was so prohibitive it quickly became obvious that simple
> things in ASP Classic like selecting 2,000 rows and rendering them in a
> table on a web-page could in many situations be so slow as to be impractical
> in Ruby.

I agree, rendering 2,000 rows isn't the the regular use-case for ActiveRecord.
If you're attempting to do so, it would be silly to create objects for all of
them. AR isn't "magic sauce"; it's a convenience library for making things
easy to work with.

> Especially since the entire request is buffered (that's more or less still
> true today, the flushing available in Rails isn't nearly the equal of
> writing directly to a TCP socket).

Buffering is the only sane default if you want safety. If you don't buffer,
and there's an error in the middle of the response, you've already sent a
broken page with a 200 response (cacheable and everything).

~~~
ssmoot
> Buffering is the only sane default if you want safety.

 _A LOT_ of sites on the web survived for many years without it. And they felt
more responsive because of it.

I think it's a fair trade-off to optimize for your user experience instead of
the machines. Honestly, Google is not going to ding you for this, and as long
as you've got an exception notifier going and are on top of it, there's no
reason it should become a real problem for _most_ sites. Being able to make a
choice per-request/action would be nice of course.

> My assumption was merely that you should measure it before you start
> worrying about it.

I have. I wrote the original versions of DataMapper (which BTW, _was_ a
DataMapper despite popular belief. The methods hanging off the models were
just helpers to the Session originally but it lost it's way at some point).

I can tell you for a fact that it's slow. Sure, go ahead and measure it. I'm
not saying that's _bad_. Just apply some judgement to the situation. Are you
grabbing a lot of rows? Are you performing lots of individual selects? (Or
using AR+include options where each include is effectively the same as another
query?)

You can feel free to write slow code until it becomes a problem, or you can
develop some good practices _up front_ , before ever needing to profile, to
prevent much of that. Disregard for reasonable optimization and forming good
habits based on generalized rules of thumb is just as evil as premature
optimization. We're not talking about twiddling minutia here.

> it would be silly to create objects for all of them

Why?

Hibernate can do it. NHibernate can do it. LLBLGenPro could do it (probably
the closest to Ruby's Sequel, but here it was a non-issue even with Models).
Simpler AR style O/RM's like Wilson O/R Mapper could do it.

Given that AR doesn't really provide a decent DAL tool that I can recall, I'd
say it's perfectly reasonable to expect that the "Full Stack" framework you're
using covers the bases without nasty surprises. Especially if you've come from
a background where millions of method-dispatch events imposes nanoseconds of
overhead instead of milliseconds of overhead (Java and .NET at least).

I don't think it's at all "silly". I think it's a perfectly reasonable goal.

------
dmmalam
My experience has been that ORMs generally 'suck' outside the trivial CRUD
use-cases, and it's usually faster and simpler just to write native queries
again your database.

I've used hibernate, linq, active record and mongoose in real production
applications, and pretty much every time the same sequence of events happen.

1) Wow, this ORM is so cool, I've saved so much time as I dont need to learn
anything about the underlying DB.

2) New feature comes in that requires something other than CRUD. Oh how do I
do this with the ORM. Either you extend it with plugins, or learn the ORM
query language that's kinda like the native query language but worse.

3) Now we got some real data, why is it slow as hell? Oh, those 'cool'
abstractions are creating n^2 queries, implicit joins, etc

4) Let's rip apart the ORM, and use a thin layer that just execute the native
queries.

A few years back I worked on a quant finance application that had 100GBs of
tick data in a oracle DB, and the only way to get reasonable performance AND
abstraction was to write optimised stored procs for each feature in the end
application, and have the middle tier just call them through a thin
abstraction layer.

At our startup dump.ly, we started with using Mongoose with Mongo because it
was so easy just to persist some JSON data. However very quickly we went
thought the above loop, and are now 100% Redis, with a thin layer that
executes native Redis commands. It just ended up being simpler and faster.

Takeaway is that in general ORMs are an anti-pattern. Every DB (noSQL or
yesSQL) is a compromise between a set of features, performance, consistency,
scalability etc. You need to understand these in detail and hence should not
be abstracted away.

------
dools
Sequel looks pretty interesting. I've really fallen out of love with
ActiveRecord ... mostly because it really only simplifies the most basic
scenarios and as an abstraction it's so very "leaky" as Joel would say. I've
begun work on an idea for an ORM that just simplifies the most tedious parts
of writing SQL but requires no boilerplate classes and is light and speedy
<https://github.com/iaindooley/PluSQL> (but it currently only supports MySql
because it requires buffered query sets)

------
adrianmsmith
"Never assume, always measure" - Never a truer word spoken!

~~~
cicloid
Words to live by…

------
scriby
I worry that this sort of article could be misinterpreted easily, even with
the disclaimer.

It's talking about optimizing something that takes less than 1% of the vast
majority of an application's cycles. Which doesn't really mean anything... Be
careful using this sort of benchmark to disproportionately affect decision
making.

~~~
mattetti
I hear you, but my point wasn't to show how to optimize AR but instead what's
going on when you initialize a model. In other words it's more about software
design and architecture decisions than performance.

~~~
scriby
"Performance cost of Ruby ORM model instantiation" may not have been the best
title :)

~~~
mattetti
Agreed, the post title is actually "Quick dive into Ruby ORM object
initialization".

------
railsmax
Thanks for the great post!!! It is really usefull

