Absolutely love the story. TimescaleDB & InfluxDB have had a lot of posts on HN,...

mpsq · on July 28, 2020

As you said, performance is the main differentiator. We are orders of magnitude faster than TimescaleDB and InfluxDB on both data ingestion and querying. TimescaleDB relies on Postgres and has great SQL support. This is not the case for InfluxDB and this is where QuestDB shines: we do not plan to move away from SQL, we are very dedicated in bringing good support and some enhancements to make sure the querying language is as flexible and efficient as possible for our users.

mfreed · on July 29, 2020

Hi, TimescaleDB cofounder here. Nice to read about your journey in time-series data, and always welcome another database that can satisfy a specific type of developer needs.

I also commend you on your desire to rebuild everything Postgres offers from scratch. We took a different route by building on top of Postgres (which e.g. allowed us to launch with native replication, rock-solid reliability, window functions, geo spatial data, etc without sacrificing performance). But there are many ways up this mountain!

As a quick thing, however: While it’s not very representative of the workloads we typically see, I tried your simple 1B scan on a minimally-configured hypertable in TimescaleDB/PostgreSQL, and got results that were >12x faster on my 8-core laptop than what you were reporting on a 48-core AWS m5.metal instance.

I think the Hacker News community always appreciates transparency in benchmarking; looking forward to reading a follow up post where you share reproducible benchmarks where all databases are tuned equivalently.

shay_ker · on July 28, 2020

I'm sure many folks would be really interested to see two things:

1. A blog post around a reproducible benchmark between QuestDB, TimescaleDB, and InfluxDB

2. A page, like questdb.io/quest-vs-timescale, that details the differences in side-by-side feature comparisons, kind of like this page: https://www.scylladb.com/lp/scylla-vs-cassandra/. Understandably, in the early days, this page will update frequently, but that level of transparency is really helpful to build trust with your users. Additionally, it'll help your less technical users to understand the differences, and it will be a sharable link for people to convince others & management that QuestDB is a good investment.

avthar · on July 28, 2020

Perhaps the QuestDB team could add it to the Time Series Benchmarking Suite [1]? It currently supports benchmarking 9 databases including TimescaleDB and InfluxDB.

[1] https://github.com/timescale/tsbs

mpsq · on July 28, 2020

This is a great idea, we will have a look! It is good to see that the ecosystem is moving towards a normalized / "standard" benchmarking tool.

PeterCorless · on July 30, 2020

Love it! And agree. Hopefully this has the same broad effects the same way YCSB benchmark or Jepsen testing really helped level the playing field.

PeterCorless · on July 30, 2020

Wow! Nice. I am surprised neither Scylla nor KairosDB are on that list. I think you could run Scylla by itself (to compare with raw Cassandra) and also re-run with KairosDB running on top of Scylla and Cassandra to see what effects that has on performance. (Though of course, there are advantages to having KairosDB, too.)

mekster · on July 29, 2020

Did you also compare to VictoriaMetrics?

j1897 · on July 29, 2020

Not yet - there is a bench vs clickhouse that has been done by one of their contributor though see below in the comments.

hawk_ · on July 28, 2020

do you do realtime steaming using SQL as well?

bluestreak · on July 28, 2020

Over the network streaming is not yet available. Someone has mentioned Kafka support, how useful would that be to stream processed (aggregated) values and/or actual table changes?

avthar · on July 28, 2020

Are there any performance comparisons to TimescaleDB and Influx that you can share? A blog post perhaps?

j1897 · on July 28, 2020

hi there - co-founder of questdb here. The demo on our website hosts a 1.6 billion rows NYC taxi dataset with 10 years of weather data with around 30-minute resolution and weekly gas prices over the last decade.

We've got example of queries in the demo, and you can see the execution times there.

We have posted a blog post comparing the ingestion speed of InfluxDB and QuestDB via InfluxDB Line Protocol some time ago: https://questdb.io/blog/2019/12/19/lineprot

dominotw · on July 28, 2020

> We are orders of magnitude faster than TimescaleDB and InfluxDB

I think gp might be asking for a source for this claim.

I see execution times on the demo but not sure if thats enough to say its faster than timescale.

mpsq · on July 28, 2020

j1897 is referring to https://questdb.io/blog/2020/04/02/using-simd-to-aggregate-b...

srini20 · on July 28, 2020

(Hard to draw many meaningful conclusions from a single, extremely simple query without much explanation?)

Graph shows PostgreSQL as taking a long time, but doesn't say anything about configuration or parallelization. PostgreSQL should be able to parallelize that type of query since 9.6+, but I think they didn't use parallelization in these experiments with PostgreSQL, even though they used a bunch of parallel threads with QuestDB?

So would be good to know:

- What version of Postgres

- How many parallel workers for this query

- If employing JIT'ing the query

- If pre-warming the cache in PostgreSQL and configuring it to store fully in memory (as benchmarks with QuestDB appeared to do a two-pass to first mmap into memory, and only accounting for the second pass over in-memory data).

etc

Database benchmarking is pretty complex (and easy to bias), and most queries do not look like this toy one.

mpsq · on July 29, 2020

I agree that our blog post lacks of details, here are some:

- PostgreSQL 12

- 12

- No

- We ran the test using the pg_prewarm [0] module, the difference was negligible

Regarding the "toy" query, the reason we are showcasing this instead of other more complex queries is because this is a simple, easily reproducible benchmark. It provides a point of reference for performance figures.

> Database benchmarking is pretty complex (and easy to bias), and most queries do not look like this toy one.

I would say that benchmarking is very hard. We tried not to perform a biased benchmark by running something that is not time-series specific and which does not put us in advantage compared to what Postgres should do.

The takeaway from this is that configuration is important and we should expose it. The next benchmark we do will have an associated repository so people can review our config and point non optimal items if any.

[0]: https://www.postgresql.org/docs/9.4/pgprewarm.html

gregmac · on July 28, 2020

Is also be interested in hearing when is QuestDB not a good choice? Are there use cases where TimescaleDB, InfluxDB, ClickHouse or something else are better suited?

j1897 · on July 28, 2020

Hard question to answer because each solution is unique and has its own tradeoffs. Taking a step back QuestDB is a less mature product than the ones mentioned, and therefore there are many features, integrations etc. to build on our side. This is a reflection of how long we have been around and capital we have raised versus those companies who are much larger in size.

GordonS · on Aug 2, 2020

If you are already using Postgres, then TimescaleDB is a natural fit - not having to deploy and manage a separate service is a real boon. You can also join with non-TimescaleDB tables, so if you need to combine time series data with regular relational data, that's another advantage.

bluestreak · on July 28, 2020

OLTP is not a good fit, if your workflow consists from INSERT/UPDATE/DELETE statements