

Arel merge, rails' hidden gem - ben_h
http://benhoskin.gs/2012/07/04/arel-merge-a-hidden-gem

======
emiller829
Comments aren't available on the article, so a couple of quick clarifications
here:

Nitpicky, but ActiveRecord::Relation is not ARel. ARel is the relational
algebra library that underpins ActiveRecord (>=3.0). See
[http://erniemiller.org/2010/05/11/activerecord-relation-
vs-a...](http://erniemiller.org/2010/05/11/activerecord-relation-vs-arel/) for
more info.

The specific example he used to demonstrate worked transparently because of
the has_many :through relationship for users on Article, which requires the
collaborations table to filter the users, so joins it. Otherwise, you would
need to do the join yourself before merging the scope, and things get messy
pretty quickly, especially if you end up with table aliases (which the merged
relation knows nothing about -- it will still query against the collaborations
table).

I added "sifters" to Squeel (<http://github.com/ernie/squeel>) in order to
address the need for a set of reusable conditions on a specific model that
could work through an association. I'm not saying Squeel is the solution
you're definitely looking for, but I just want to let people know about the
things they'll need to look out for when using Relation#merge.

~~~
ben_h
Good point, #merge isn't itself an arel method. I say arel because it's the
core sitting behind the #merge / #where / #joins frontend -- but AR::Relation
deserves credit too :)

------
zzzeek
I've never been a fan of Rails' hieroglyphics. "The query is the one you’d
hope for" - why do we need to "hope", when the ORM could just allow you to use
relational concepts directly ? Here is SQLAlchemy's much less exciting version
of what I see here for "merge", just use a hybrid (sorry, we have more verbose
config, due to explicit is better than implicit):

    
    
        from sqlalchemy import Column, Integer, String, ForeignKey, Enum
        from sqlalchemy.orm import Session, relationship, backref
        from sqlalchemy.ext.declarative import declarative_base, declared_attr
        from sqlalchemy.ext.hybrid import hybrid_property
    
        class Base(object):
            @declared_attr
            def __tablename__(cls):
                return cls.__name__.lower()
    
        Base = declarative_base(cls=Base)
    
        class SurrogatePK(object):
            id = Column(Integer, primary_key=True)
    
        class Article(SurrogatePK, Base):
            headline = Column(String)
    
            @property
            def users(self):
                return self.collaborations.join("user").with_entities(User)
    
        class User(SurrogatePK, Base):
            name = Column(String)
    
        class Collaboration(Base):
            article_id = Column(ForeignKey('article.id'),
                                    primary_key=True)
            user_id = Column(ForeignKey('user.id'),
                                    primary_key=True)
            role = Column(Enum('editor', 'author'))
            user = relationship("User", backref="collaborations")
            article = relationship("Article",
                        backref=backref("collaborations", lazy="dynamic"))
    
            @hybrid_property
            def editorial(self):
                return self.role == 'editor'
    
        sess = Session()
    
        some_article = Article(id=5)
        sess.add(some_article)
    
        print some_article.users.filter(Collaboration.editorial)
    

you get the same "one line, DRY" calling style at the end and equivalent SQL:

    
    
        SELECT "user".id AS user_id, "user".name AS user_name 
        FROM collaboration JOIN "user" ON "user".id = collaboration.user_id 
        WHERE :param_1 = collaboration.article_id AND collaboration.role = :role_1

~~~
ben_h
Hope was definitely a bad choice of word. By that I meant, the sort of tight
query you'd hope an ORM would deliver.

Anyhow, it's a matter of taste, but what may appear at first as hieroglyphics
actually is straightforward. It's just that concision here means some packed
meaning and some assumed knowledge, so you have to know how to read it. In
this case, that's an easy trade-off for me.

I'm not familiar with SQLAlchemy, so I find your example equally hard to read,
compounded by there being much more code to spelunk through to understand.

Different strokes and all that, though. There's room for plenty of frameworks
:)

~~~
sirn
> I'm not familiar with SQLAlchemy, so I find your example equally hard to
> read, compounded by there being much more code to spelunk through to
> understand.

If you remove few column definition and setup code, it actually boils down to
just:

    
    
        class Article(ArticleColumns):
            @property
            def users(self)
                return self.collaborations.join("user").with_entities(User)
        
        class User(UserColumns):
            pass
        
        class Collaboration(CollaborationColumns):
            user = relationship("User", backref="collaborations")
            article = relationship("Article", backref=backref("collaborations", lazy="dynamic"))
        
            @hybrid_property
            def editorial(self):
                return self.role == 'editor'
    

In which you can now do

    
    
        some_article.users.filter(Collaboration.editorial)
    

which generates similar SQL query as #merge.

------
ef4
I've been using Arel directly (with ActiveRecord) for a project. It's 80% of
what I want, but then it inexplicably sucks at the last 20%.

The whole point of a relational algebra is that it's closed under all the
relevant operations. But Arel's implementation mostly ignores this fact, and
you get back different types of objects with incompatible APIs depending on
what operations you use and even what order you apply them in.

Concrete example one: "foo.union(bar).union(baz)" explodes because the union
operation is not composable.

Concrete example two: you can compose joins from the right but not the left.
So "(foo.join(bar)).join(baz)" works but baz.join(foo.join(bar))" explodes --
but not until later when you try to dump it to sql, at which point you get an
obscure exception.

When you look under the hood, you see that it's having a hard time with this
stuff because it doesn't really implement relational algebra. It's mostly just
an abstract syntax tree for SQL.

~~~
dkubb
The current version of ARel is not a relational algebra library, it's an SQL
compiler according to Aaron Patterson [1]. Many of the concepts and names for
internals are based on SQL not RA. There's nothing wrong with that, it's a
pragmatic choice for Rails given that ActiveRecord is an RDBMS only
abstraction.

I've been working on a relational algebra library for ruby (tentatively)
called veritas [2] where the sets are closed under all operations.

Given that it's a higher level abstraction than ARel the trade off is that
there's not a 1:1 mapping between it's pure RA ops and SQL operations. I err
on the side of producing SQL that will returns the correct results, which
means some of the queries are a bit verbose. I consider that only a temporary
problem though; I believe I can get to the point where most common queries are
identical to what you'd write by hand. I'm focused on correctness before
performance.

I've also written an optimizer [3] that takes the RA AST and rewrites it to be
smaller and more efficient. It handles lots of the low hanging fruit, but
there's still room for improvement. The advantage to this approach is that it
simplifies the AST for all targets, not just SQL. There's even room to do per-
target optimizations, which is nice because there is often multiple ways to
form a query, and some approaches may be more efficient than others on
different backends.

[1] <https://github.com/sconover/knit-js#footnotes> [2]
<https://github.com/dkubb/veritas> [3] <https://github.com/dkubb/veritas-
optimizer>

------
davidw
Here's one that's been bugging me - anyone want to take a crack at it?

    
    
        @dev_configs = DeviceConfig.
        joins("join (select device_id, max(updated_at) as max_updated_at 
        from device_configs group by device_id) dc2 
        on dc2.device_id = device_configs.device_id and   
        dc2.max_updated_at = device_configs.updated_at").
        includes("device").order("devices.updated_at desc")
    

Each device has many device configurations, and we want to display all
devices, along with the latest device configuration, and order the whole thing
by when the device was updated. The above works, but is mostly working
directly in SQL, rather than with Rails.

~~~
omarqureshi
This is something you should be able to do with Arel (with join_sources) or
through the use of to_sql

as a note - you shouldn't use includes and joins, includes hammers joins.

Simplest tidy up though is a scope for the join, something like

    
    
        DeviceConfig.joins("#{DeviceConfig.last_updated.to_sql} dc2 
                              on dc2.device_id = device_configs.device_id 
                              and dc2.max_updated_at = device_configs.updated_at).
                     order(...)
    

Alternatively maybe something like

    
    
        dc_t = Arel::Table::new :device_configs
        dc2_t = Arel::Table::new :device_configs
    
        inner = dc_t.group.(dc_t[:device_id]).
                     project(dc_t[:device_id].as("device_id"),
        dc_t[:updated_at].maximum.as("maximum_updated_at"))
    
        dc2_t.join(inner.join_sources).
              on(dc2_t[:device_id].eq(inner[:device_id])).
              on(dc2_t[:updated_at].eq(inner[:max_updated_at])).
              order(...)
    

im not entirely sure of the on syntax there, the API doesnt look clear.

Atleast, those are the two solutions I have though personally - i think it
reads better as straight SQL with select_values. That way, you know what is
happening.

------
yxhuvud
I know of merge, but usually I just don't need it. article.users.editorial
would usually be enough for my needs.

I'd have great usage of union (which doesn't exist, or doesn't exist in the
version of rails I'm stuck in) though.

------
joshmlewis
While I can't offer technical feedback like everyone else, I'd say tone down
that background a little bit for easier readability. :)

Edit: Instead of #CCC try #EAEAEA. It made a good difference for me.

------
chrismealy
If you're using has_ancestry merge can let you do some amazing things just by
adding .merge(path) or .merge(subtree) to a query.

