
Effectively Using Materialized Views in Ruby on Rails - moritzplassnig
https://pganalyze.com/blog/materialized-views-ruby-rails
======
code_biologist
Overall, I love this approach, but the big pain for me is the lack of
incremental view refreshes. I end up needing to recreate giant tables each
refresh even though the underlying changes are small. Sure I could implement
that myself, but that sacrifices the correctness guarantees that materialized
views provide!

Two things that would help:

1\. Getting Incremental View Maintenance (IVM) [1] into Postgres. It looks
like work is beginning on this but it's been 7+ years coming. If there are any
Postgres devs looking at this, I'm cheering for you!

2\. There's a commercial group doing an implementation of this, Materialize
[2], but they don't have any publicly released product yet. I mention it
because their interview on Data Engineering Podcast is really good [3] and I
really like their focus on correctness by working from replication logs.

[1]
[https://wiki.postgresql.org/wiki/Incremental_View_Maintenanc...](https://wiki.postgresql.org/wiki/Incremental_View_Maintenance)

[2] [https://materialize.io/](https://materialize.io/)

[3] [https://www.dataengineeringpodcast.com/materialize-
streaming...](https://www.dataengineeringpodcast.com/materialize-streaming-
analytics-episode-112/)

~~~
meritt
Likely too boring and old school data warehousing, but this is what the notion
of rollup/aggregate tables achieve. You define a table that's the result of
some sort of an aggregation query (time is almost always involved: minutely,
hourly, daily, monthly plus other business-specific dimensions) and at the
conclusion of each corresponding interval you insert the next respective
batch.

    
    
        e.g. insert into rollup_clicks_hourly select hour, count(*) clicks from clicks where created_at >= ? and created_at < ? group by 1
    

Then from a reporting perspective, users/dashboards/etc read from the highest
relevant table that meets their criteria.

~~~
purerandomness
That's a very clever concept.

Could you recommend any classic old-school books about data warehousing that
you'vre read that teach more such techniques?

Thank you!

~~~
hobs
Its pretty much the kimball books - [https://www.kimballgroup.com/data-
warehouse-business-intelli...](https://www.kimballgroup.com/data-warehouse-
business-intelligence-resources/kimball-techniques/dimensional-modeling-
techniques/)

~~~
meritt
Yup, exactly. Kimball's stuff is the best. You can achieve on a modest machine
what modern techniques would require an incredibly expensive horizontally
scaled MPP database. It does require a lot more planning and forethought to be
certain.

[https://www.amazon.com/dp/product/1118530802](https://www.amazon.com/dp/product/1118530802)
and possibly
[https://www.amazon.com/dp/0764567578/](https://www.amazon.com/dp/0764567578/)

------
semiquaver
I used a technique very much like this in the past. Here are two potential
gotchas using matviews with rails, both related to the inevitable reality that
DDLs evolve over time:

\- postgres supports concurrently refreshing the contents of existing
materialized views but there's no built in way to change the structure of the
matview concurrently. Which means any `ALTER MATERIALIZED VIEW` will result in
all reads to the view blocking for potentially many minutes or longer, for
views over nontrivial data. Adding a column to a table is a cheap and non-
blocking migration, but allowing the new column to appear in matviews can
require an outage or fancy transactional view name swapping that libraries
like scenic don't support.

\- a column that is included in a view of any kind cannot change in any way,
even ways that are binary-compatible like VARCHAR->TEXT. This comes up
surprisingly often and getting around it is annoying.

~~~
e12e
How about create new materialized view, then rename (or drop) the old, rename
the new? That is assuming you don't need model changes, or can guard/fence
them somehow.

------
toasterlovin
View Backed Models are a super power. If you’re a working Rails dev and you’ve
never used them before, you really owe it to yourself to check them out. IMO,
they should be baked into Rails itself. They’re that useful.

------
j4mie
Are there any sensible ways to do this with Django? Ideally with migrations to
create and update the views.

~~~
mariocesar
> Are there any sensible ways to do this with Django? Ideally with migrations
> to create and update the views.

There are apps to integrate PG views as django models, however is kind of
trivial to do it yourself.

1\. Create a migration that runs the SQL to create the view like ```create
VIEW active_users AS SELECT rest of view ... `

2\. Create a model mapping all the fields you want to use, add the
Meta.table_name with the name of the view, and Meta.managed=False to avoid
adding this to your migrations in the future.

Use it as any other normal Model.

Just that

About the updates with migrations I think if you already are thinking on using
VIEWS, you could just create migrations that drop and recreate it when you
needed. Is what I do, and is very little work for that.

~~~
mariocesar
Here is an old article that I used to implement my own
[https://www.fusionbox.com/blog/detail/using-materialized-
vie...](https://www.fusionbox.com/blog/detail/using-materialized-views-to-
implement-efficient-reports-in-django/643/)

------
JangoSteve
I once worked on a larger Rails project where we had decided to utilize
Postgres views and materialized views to optimize the performance of parts of
our application. This was about 7 years ago, so before Scenic existed to help
with this (or at least before it was known). We had to manually manage the use
of our views while keeping Rails apprised of the underlying schema through
custom schema files and migration methods.

One of the main issues we had run into, in addition to all the complexity that
comes with managing a cache and no longer being able to count on your db
queries to return the latest information, was that our views had to be deleted
and re-created for nearly every schema change (at least every schema change
that affected the views, but we had so many views, this ended up being
probably 90% of our schema changes).

We ended up overriding the standard Rails "up" and "down" migration methods to
prepend the deletion of these views, append their re-creation, clean up after
failed migrations, etc. I remember us spending a good amount of time across
the team dealing with weird edge cases that cropped up from this. I also
remember spending a fair amount of time training all the developers on these
issues since almost no one on the team had experience using or dealing with
database views. I assume managing migrations and schema of your views within
Rails is all included in what Scenic does for you for free, since these
hassles weren't mentioned in the article.

It's really nice to see something that handles all that complexity for you.
However, that still leaves the standard and age-old caching issues with stale
data. Another side effect was that they started being a bandaid for
inefficient queries and data structures in the database, which is probably how
we ended up with so many db views in the first place.

In the end, after probably 8 months of using these views, we decided to excise
them from the application entirely and go back to optimizing our actual
database queries. We got to delete a lot of custom code, our schema changes
and migrations became simpler, our data became fresh again, and we got to
reinvest that time into improving our underlying data structures and queries.

This probably is not an argument against using db views, but rather an
anecdote of what can happen when you resort to them prematurely. If you have
queries that are 100ms or more, there are probably more traditional
optimizations you can find, such as restructuring your relational data,
adding/removing indexes, etc. If you're actually trying to eliminate those
last 10s of milliseconds from your queries though, as is shown in the article,
I can see them being a good option.

------
gigatexal
Another approach could be to create the materialized view and then use some
sort of event system say the pg_notify subsystem to capture an event and then
increment some aggregation -- this would eliminate the need for scheduled view
refreshes and keep the view almost real-time and quick.

~~~
udfalkso
This might be best done using triggers. I've had good experiences with the
Hair Trigger gem for managing these in rails.

~~~
gigatexal
Trigger would work. But going the pg_notify approach could allow for
asynchronous updating whereas the trigger approach could slow writes.

------
pdkl95
"There are two hard things in computer science: _cache invalidation_ , naming
things, and off-by-one errors." (Phil Karlton, _et al_ )

Views (both regular and materialized) are really useful, but remember that
storing data in two places really is one of the of the hard problems in CS.
Refreshing the materialized view on a timer might seem easy, but 6 months from
now when something important reads from the stale view instead of the real
table could become a really frustrating bug. A materialized view might still
be a worthwhile optimization; just remember that you're also adding cache
management, which might be more complicated than you suspect.

~~~
code_biologist
Presumably if you're materializing a view, you have to have a cache for speed
— it's either this or ElasticSearch or Redis. For me, where possible, it's
nice to have a correct cache a `REFRESH MATERIALIZED VIEW` away instead of
needing to make sure your cache book-keeping is perfect with other solutions.

But yes, you're right that this is caching, with all associated pitfalls.

------
drusepth
Is there an upper limit on the number of materialized views that it makes
sense to have in a database?

I've been trying to figure out a way to deal with my (growing) join of 13
rather large tables when fetching all of a user's data. Are materialized views
the kind of thing that I could generate these cached results _per user_ and
have, like, hundreds of thousands of them sitting around to query? And then be
able to query against all of user 123456's content directly instead of
filtering N tables for their content and joining them all together every time?

------
geori
How often does rails refresh the materialized views? And is it automatic or do
you have to explicitly refresh them?

~~~
lfittl
You typically do it on a schedule, e.g. once per hour, or once every day.

In the post the whenever gem
([https://github.com/javan/whenever](https://github.com/javan/whenever)) is
used, but you could use anything that is able to run things periodically.

If you'd want to stay solely within Postgres, there is also pg_cron
([https://github.com/citusdata/pg_cron](https://github.com/citusdata/pg_cron))
which could be used to call REFRESH MATERIALIZED VIEW from within the
database.

