
Real-time event aggregation at scale using Postgres with Citus - samaysharma
https://www.citusdata.com/blog/2016/11/29/event-aggregation-at-scale-with-postgresql/
======
ahachete
The whole idea about the post is quite interesting. Definitely shows the high
performance capabilities of the system. I really appreciate the addition of
the parallel insert via INSERT ... SELECT which is quite handy.

I have a question about the topic of the post. The method presented could also
be simulated with a MATERIALIZED VIEW? Are they supported in Citus? In any
case, the technique presented is pretty cool :)

~~~
ozgune
(Ozgun from Citus Data)

We're planning on supporting VIEWS (but not MATERIALIZED VIEWs) in Citus 6.1:
[https://github.com/citusdata/citus/issues/442](https://github.com/citusdata/citus/issues/442)

If you're interested in using Materialized Views in Citus, please feel free to
comment on the issue or open a new one. We'd be happy to incorporate your
feedback into upcoming releases.

You're also correct in that the method presented in this blog post could be
simulated with a distributed materialized view. One potential difference could
be that materialized views in PostgreSQL currently refresh the entire data
from source tables. This method provides flexibility to do rolling inserts or
upserts.

~~~
ahachete
+1 for VIEW support :)

I agree that the suggested method is more flexible (which means more
performance in this kind of environments/problems). Mat views provide
simplicity. I'd say not a must, but they may be useful for users who don't
want to dig deep into these techniques.

------
georgewfraser
Pre-aggregating data like this is a last resort in data warehousing. It's
fragile: in this example, it depends on your data coming in exactly-once, in-
order. It's also the opposite of future proof: soon, you will have a query
that your pre-aggregation can't support, and you'll have to build more pre-
aggregations.

Pre-aggregation is still a tool in the toolbox, but you should first consider
switching to a purpose-built column-store like Redshift/BigQuery/Snowflake.

~~~
glial
> you should first consider switching to a purpose-built column-store like
> Redshift/BigQuery/Snowflake.

True, unless you have regulatory requirements that you can't store your data
off-site.

~~~
nattaylor
Something like self hosted HDFS + Parquet + Presto/Hive would work though,
wouldn't it?

~~~
georgewfraser
Well there's lots of good on-prem columnar data warehouses. They're expensive,
but you have to think in terms of total cost of ownership: a good warehouse
will require less time from your team building workarounds.

~~~
greggyb
I'm not sure what your threshold of expensive is, but this is available in MS
SQL Server. You don't necessarily need to purchase an application solution or
a specific DW RDBMS.

[https://www.microsoft.com/en-us/sql-server/sql-server-
editio...](https://www.microsoft.com/en-us/sql-server/sql-server-editions)

------
brightball
... _tears of joy_ ...

