> Building a distributed time-series database on PostgreSQL Next order of busine...

pvorb · on Aug 21, 2019

Which databases are good for analytics from your point of view?

In my experience, being able to do advanced ad-hoc SQL queries is priceless for analytics. Timescale helps in scaling time series use-cases that used to scale badly in plain PostgreSQL.

manigandham · on Aug 21, 2019

There are other relational databases like MemSQL or Clickhouse that use distributed column-oriented architectures that are much better at large scale analytics and aggregations.

Postgres is getting pluggable storage engines in the next version (and already has foreign data wrappers) so that can at least lead to a better storage design.

einpoklum · on Aug 21, 2019

It's not just my point of view - it's well known in the research community, and has been for decades.

For FOSS, have a look at MonetDB. For research-oriented systems, look for publications regarding HyperDB or VectorWise/Actian Vector (VectorH in the cluster version). Other commercial offerings are Vertica (formerly C-Store) and SAP Hana.

PostgreSQL is not even something anyone compares against in analytics...

einpoklum · on Aug 22, 2019

Oh yea, MemSQL and ClickHouse are also indeed relevant and in this category, except that ClickHouse doesn't support all of SQL and any table structure, so it's not a full-fledged DBMS.

akulkarni · on Aug 21, 2019

If you take a look at any of our benchmarks, you’ll see that this is not the case. PostgreSQL in fact can scale quite well for time-series analytics, if architected correctly.

But why don’t you just try out TimescaleDB and see for yourself?

einpoklum · on Aug 21, 2019

Please link to those benchmarks, and we'll see. Also, a link to the relevant SIGMOD/VLDB/ICDE/DaMoN/ADMS/etc. submission arguing in favor of TimeScaleDB's design would also be appreciated.

On the linked-to article I only see references to irrelevant transactional DBMSes...

mfreed · on Aug 21, 2019

Open-source Time-Series Benchmarking Suite: https://github.com/timescale/tsbs

InfluxDB: https://blog.timescale.com/blog/what-is-high-cardinality-how... https://blog.timescale.com/blog/timescaledb-vs-influxdb-for-...

Cassandra: https://blog.timescale.com/blog/time-series-data-cassandra-v...

MongoDB: https://blog.timescale.com/blog/how-to-store-time-series-dat...

einpoklum · on Aug 22, 2019

Yes, it's just like I thought. You're comparing against transaction-oriented DBMSes, or ones which handle documents rather than tabular data (and hence slow on tabular data).

One possible exception is InfluxDB - I'm not familiar enough with it.

Anyway, try running TSBS on columnar DBMSes like Actian VectorH, Vertica, SAP HANA etc. ClickHouse may also be relevant; they don't support any possible schema, but it may be enough to run TSBS.

RobAtticus · on Aug 22, 2019

We're happy to take pull requests for new databases, we have so far from Clickhouse, CrateDB, and SiriDB (and one pending). We've tried to make it relatively easy for new databases to hook in.

We usually implement ones that we hear about a lot from customers, and so far those haven't come up a ton. We'll keep it in mind though as we look to keep adding new ones.

oddtodd · on Aug 22, 2019

At the end of 2018 Altinity benchmarked ClickHouse against the TSBS and documented it.

https://www.altinity.com/blog/clickhouse-for-time-series

softwarelimits · on Aug 22, 2019

Thank you very much for your valuable input - it is very important that people understand the differences and look into this!

Performance comparisons to the candidates you named would be very interesting to see.

Downvoters: you should be happy that people with more knowledge than the average javascript-aws-webdevops-guy that is needed to operate a startup invest time to inform you about alternatives you might not know about.

Also it is important to keep this site attractive to people that have a different opinions and experiences - do not do that trump thing! Thanks!

Of course, for each claim replicable facts are needed.

XuMiao · on Aug 22, 2019

I am curious about that too. As a separate topic, if the operational dbs can be compatible with parquet type storage (backup and restore), the offline analytics and machine learning would be seamlessly integrated together. Offline analytics usually can simplify online analytics. Discovering new dimensions, normalization/denormalization, and optimization of indices and partitions. Operational dbs shouldn't have to stress themselves at the gunpoint.

AdamProut · on Aug 22, 2019

I think the ask was for comparisons against traditional analytics databases (redshift, Vertica, etc.). Columnstores are substantially faster for table scans + aggregations then rowstores (and they use a lot less storage) [1].

[1] http://db.csail.mit.edu/projects/cstore/vldb.pdf