
PipelineDB v0.9.9 – One More Release Until PipelineDB Is a PostgreSQL Extension - Fergi
https://www.pipelinedb.com/blog/pipelinedb-0-9-9-one-more-release-until-pipelinedb-is-a-postgresql-extension
======
craigkerstiens
PipelineDB is pretty interesting for time-series data. It takes an approach to
processing the data as it comes in, and storing aggregates or pre-aggregates
over time series. I haven't followed the latest, but as of a few years ago
much of the approach was similar to some research out of UC Berkeley from
about 10 years ago. You can find the paper that talks about that work
(TelegraphCQ CQ for continuous query) at
[http://db.csail.mit.edu/madden/html/TCQcidr03.pdf](http://db.csail.mit.edu/madden/html/TCQcidr03.pdf).
Definitely an interesting read if you're into technical papers and databases.

~~~
manigandham
Druid also does this, with pre-aggregation of streaming data along predefined
dimensions for very fast cube-based analytics. It's not a relational database
though and is just now getting a SQL interface through Apache Calcite.
[http://druid.io/](http://druid.io/)

Imply is a startup with a modern cloud/on-prem distribution of Druid with a
built-in visualization and querying tool:
[https://imply.io/](https://imply.io/)

------
isoprophlex
I didn't know the product at all, at a glance this looks amazing to be for
BI/alerting on streaming time series data.

Anyone who wants to chime in on whether this has fit your requirements for
time series data processing? Thanks!

~~~
iaabtpbtpnn
If it's a Postgres extension for time-series data, I wonder how it compares to
TimescaleDB, which I recently discovered and have been evaluating.

~~~
chucky_z
I don't know TimescaleDB but PipelineDB is more about real time constant
aggregation. You can do like a rolling sum that's has many inserts a second
for a very long time, and the speed of retrieving the sum and inserts remain
pretty constant.

~~~
grammr
Hi there! I'm one of the PipelineDB founders. This description is correct. The
unique thing about PipelineDB is that it doesn't store granular data. Once all
aggregates are incrementally updated, the raw input rows as discarded and only
aggregate output is stored.

This approach dramatically limits disk IO and long-term storage requirements,
and enables super high performance in most cases on modest hardware.

PipelineDB has been used in production for nearly four years now and is used
by Fortune 100 companies.

~~~
doh
So once you make it as an extension, any chance to mix PipelineDB with Citus
in one cluster?

My hunch says that it's possible as far as there is some additional
computation done with the future aggregate query on the coordinator in Citus.

PPDB looks interesting, but we also need to keep the underlying raw data and
multiple clusters require more complex pipeline.

~~~
grammr
We haven't looked too far into integrations with any existing systems at this
point, but if there was significant user demand for it on both ends we'd
definitely be open to it.

One thing I will mention here is that we do have plans to add support for
persistent streams [0] after version 1.0.0 is released. We've learned a lot
over the years about how our users/customers interact with streams in
production and persistent streams will be built atop that foundation of
understanding.

Please feel free to comment on that issue with your use case, requirements,
etc. and we'll see what we can do!

[0]
[https://github.com/pipelinedb/pipelinedb/issues/1463](https://github.com/pipelinedb/pipelinedb/issues/1463)

~~~
doh
Persistent streams are interesting, but we spent years refining our ETL and
building it around Citus, that it would be very complicated to separate those
two. I will wait for the extension and do some testing.

------
airstrike
This would be absolutely perfect for the job I had in Sales Intelligence a few
years ago... except we were locked into SQL Server and there was no way the
powers that be would ever let us switch over to PostgreSQL.

~~~
manigandham
SQL Server 2017 has in-memory (hekaton) storage engine and columnstore
indexes. Combine them both and you can do the same thing with real-time
queries over the entire dataset.

------
crudbug
What is storage model compared to timescaledb [0]

[0]
[https://github.com/timescale/timescaledb](https://github.com/timescale/timescaledb)

~~~
Fergi
The storage engine for PipelineDB is PostgreSQL and the output of continuous
SQL queries (continuous views in PipelineDB) is stored in what are essentially
incrementally updated, realtime tables. You can think of PipelineDB as very
high throughput, incrementally updated materialized views, also.

see: [http://docs.pipelinedb.com/continuous-
views.html](http://docs.pipelinedb.com/continuous-views.html)

------
Rapzid
Correct me if I'm wrong, but PipelineDB gives effective access to data in
commit order right?

------
skunkwerk
can't wait for support on RDS!

~~~
tejasmanohar
FWIW, AWS has a whitelist of Postgres extensions you can use in RDS so that'll
probably take more time, if it ever happens.

