The author mentioned write performance, but didn't touch much on read performance. Are there any benchmarks you can share in that direction?
Also the author didn't touch much on the space requirements or the node splitting expectations.
Those two areas would be useful to explain in greater detail.
reply
Space requirements can admittedly be a bit higher than some other time-series databases or column stores. Since disk is cheaper than memory, and with easy ways of doing retention, we think its a worthwhile trade-off for people for now. But compression is something we will look to down the road. For our later benchmarks we will include memory/space comparisions to help people evaluate.
As for node splitting, our clustering solution (not yet released) is being worked on quite a bit as well. When its closer to being ready we'll include our thoughts on how to best manage your partitions.
Can you speak as to the stability of your system for production usage?
Full SQL interface
Scale-out distributed architecture
Auto partitions across space and time
Single-node and cluster optimizations
Complex query predicates
Distributed aggregations
Native JOINs with relational data
Labels and metrics support
Geo-spatial query support
Fine-grained access control
Reliable (active replication, backups)
Automated data retention policies
Also, github if you just want to see the code:
https://github.com/timescale/timescaledb/
I was investigating the same topic (PG based timeseries database) for a stock tick data project, would definitely give timescaledb a try.
Since financial data is mentioned in the blog, would be curious on how it performed / scaled in practical.
We're still preparing some benchmark numbers/performance numbers that we will hopefully share in the coming weeks. We do have some write performance numbers in there as you can see. There is also a lot of churn at the moment as we're still in beta and refining some key features, so I don't want to speculate too much on how performance looks until after we get a few more of our query optimizations tuned.
https://github.com/keithf4/pg_partman
Additionally, TimescaleDB comes with optimizations on the query side specifically for time-series data.
So you distribute the childtables to several nodes of a server cluster.
Is network latency a problem? I guess one should colocate the servers in one location rather than spread it out?
How good does it work when nodes die?
Do you use query parallelization (available since 9.6 in vanilla) on a single node and across different nodes?
(We also don't currently support joins, while TimescaleDB's joins sound pretty dope :))
- Both Aerospike and Honeycomb.io don't support full sql queries. Instead supporting their own custom (and more limited query format)
- Aerospike is not optimized for time-based queries and is more like a key-value store. You cannot get the same performance
- Honeycomb is a column store. Ours is built on Postgres and can work with your existing Postgres databases
As for performance, we are still working on gathering numbers that we hope to share soon comparing us against other solutions including Influx and similar.
One thing we'll note is that currently JOINs between two time-series tables (what we call 'hypertables') are not optimized, but we're working on it! :)
[1] http://docs.timescale.com/other-sample-datasets
[2] http://docs.timescale.com/getting-started/tutorial
The author mentioned write performance, but didn't touch much on read performance. Are there any benchmarks you can share in that direction?
Also the author didn't touch much on the space requirements or the node splitting expectations.
Those two areas would be useful to explain in greater detail.
reply