
Citus and pg_partman: Creating a scalable time series database on Postgres - spathak
https://www.citusdata.com/blog/2018/01/24/citus-and-pg-partman-creating-a-scalable-time-series-database-on-PostgreSQL/
======
jnordwick
I wouldn't call this a time series database at all. To me, tsdb implies
analytics over a time dimension such as weather sensors or stock market data.

This is just patitioning a log on time so you can query the most recent and
delete the old stuff.

I doesn't even really seem to me you necessarily want to partition on time
since your load distribution is going to be terrible.

Edit: too add a little. There is a thing called a temporal database that is a
little more general in usage i feel in that it is more about facts at specific
points in time (such as your address last year) that i think this is more
about.

There is even a bitemporal database that has two time dimensions (what do we
think your last year's address is right now and what did we think your last
year's address was yesterday - and in those you don't ever delete data that is
wrong, you just update your belief about that point in time) and they are
really interesting to work with. Those would seem much more similar to this.

~~~
smilliken
Thanks for mentioning bitemporal databases, I hadn't heard of anything like
that before. Time-based mutable facts are so hard to represent well.

I think their definition of time-series database fits the common usage I've
seen everywhere: the data has a time dimension and is append-only/immutable
(well, ok, you can mutate the data in a postgresql table, but nobody's forcing
you to).

Given the choice between selecting a specialized time-series only database or
using a time-series pattern in your existing postgresql database, postgresql
is often (usually?) the more pragmatic choice. That's what we do at mixrank
with time-series tables approaching the 100 billions of rows.

~~~
sahil-kang
I also feel that using the bitemporal pattern on a Postgres DB is the most
pragmatic choice. What are some advantages to using a specialized timeseries
DB? I can’t really think of any.

~~~
smilliken
For one, you can avoid double-writing to disk by only having the log instead
of the WAL/log + table. You can save space by using a more compact binary
representation. Basically all performance/efficiency related.

~~~
nileshtrivedi
How about a merkle tree structure for storing data (like Git does)? This would
make it easy to find out what the snapshot at any given point of time was. Q
is, whether it is powerful enough to support typical data-oriented
applications?

------
polskibus
There was a post about timescale DB recently. Can you combine citusdb and
timescale somehow to get super fast timeseries data and citus flexibility?

~~~
craigkerstiens
Craig from Citus here. From our cursory looks it does look like they could
work together, though personally we've not tried it and I'm not aware of users
that have (though it's entirely plausible some have already). What you'd have
to do is setup timescale on each of your distributed nodes and tables. You'd
still talk to Citus as the primary point and it would re-write and push the
queries down to the nodes which could then leverage timescale.

------
mslot
Big shout-out to Keith Fiske for developing and maintaining pg_partman. It
makes Postgres 10's native partitioning really easy to use.

------
truth_seeker
How well it performs compared to TimescaleDB ?

~~~
mslot
In general I'd expect native partitioning to have similar performance
characteristics as TimescaleDB. On the insert side lot of the benefit comes
from partitions having small indexes, on the delete side from the ability to
drop partitions quickly, and on the select side from skipping partitions based
on filters.

Postgres 10 partitioning does have a few limitations and inefficiencies that
will be resolved in Postgres 11. Partitioning is the most actively developed
area of postgres.

Note that Citus shards across multiple nodes, and can then partition on disk
using native partitioning, which is automated by pg_partman. TimescaleDB so
far only works on a single node.

Citus can also run parallel, distributed SQL queries, perform distributed
transactions, and build rollups tables in parallel, and is used in Postgres
clusters with up to a petabyte of data.

------
brightball
To me, this seems like the ideal solution to so many multitenant
applications...

------
est
The problem with Citus and TimescaleDB is that users can not install
extensions on AWS RDS PostgreSQL.

~~~
pg314
That's a problem with AWS RDS PostgreSQL, not with Citus and TimescaleDB.

