
Launch HN: QuestDB (YC S20) – Fast open source time series database - bluestreak
Hey everyone, I’m Vlad and I co-founded QuestDB (<a href="https:&#x2F;&#x2F;questdb.io" rel="nofollow">https:&#x2F;&#x2F;questdb.io</a>) with Nic and Tanc. QuestDB is an open source database for time series, events, and analytical workloads with a primary focus on performance (<a href="https:&#x2F;&#x2F;github.com&#x2F;questdb&#x2F;questdb" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;questdb&#x2F;questdb</a>).<p>It started in 2012 when an energy trading company hired me to rebuild their real-time vessel tracking system. Management wanted me to use a well-known XML database that they had just bought a license for. This option would have required to take down production for about a week just to ingest the data. And a week downtime was not an option. With no more money to spend on software, I turned to alternatives such as OpenTSDB but they were not a fit for our data model. There was no solution in sight to deliver the project.<p>Then, I stumbled upon Peter Lawrey’s Java Chronicle library [1]. It loaded the same data in 2 minutes instead of a week using memory-mapped files. Besides the performance aspect, I found it fascinating that such a simple method was solving multiple issues simultaneously: fast write, read can happen even before data is committed to disk, code interacts with memory rather than IO functions, no buffers to copy. Incidentally, this was my first exposure to zero-GC Java.<p>But there were several issues. First, at the time It didn’t look like the library was going to be maintained. Second, it used Java NIO instead of using the OS API directly. This adds overhead since it creates individual objects with sole purpose to hold a memory address for each memory page. Third, although the NIO allocation API was well documented, the release API was not. It was really easy to run out of memory and hard to manage memory page release. I decided to ditch the XML DB and then started to write a custom storage engine in Java, similar to what Java Chronicle did. This engine used memory mapped files, off-heap memory and a custom query system for geospatial time series. Implementing this was a refreshing experience. I learned more in a few weeks than in years on the job.<p>Throughout my career, I mostly worked at large companies where developers are “managed” via itemized tasks sent as tickets. There was no room for creativity or initiative. In fact, it was in one’s best interest to follow the ticket&#x27;s exact instructions, even if it was complete nonsense. I had just been promoted to a managerial role and regretted it after a week. After so much time hoping for a promotion, I immediately wanted to go back to the technical side. I became obsessed with learning new stuff again, particularly in the high performance space.<p>With some money aside, I left my job and started to work on QuestDB solo. I used Java and a small C layer to interact directly with the OS API without passing through a selector API. Although existing OS API wrappers would have been easier to get started with, the overhead increases complexity and hurts performance. I also wanted the system to be completely GC-free. To do this, I had to build off-heap memory management myself and I could not use off-the-shelf libraries. I had to rewrite many of the standard ones over the years to avoid producing any garbage.<p>As I had my first kid, I had to take contracting gigs to make ends meet over the following 6 years. All the stuff I had been learning boosted my confidence and I started performing well at interviews. This allowed me to get better paying contracts, I could take fewer jobs and free up more time to work on QuestDB while looking after my family. I would do research during the day and implement this into QuestDB at night. I was constantly looking for the next thing, which would take performance closer to the limits of the hardware.<p>A year in, I realised that my initial design was actually flawed and that it had to be thrown away. It had no concept of separation between readers and writers and would thus allow dirty reads. Storage was not guaranteed to be contiguous, and pages could be of various non-64-bit-divisible sizes. It was also very much cache-unfriendly, forcing the use of slow row-based reads instead of fast columnar and vectorized ones.Commits were slow, and as individual column files could be committed independently, they left the data open to corruption.<p>Although this was a setback, I got back to work. I wrote the new engine to allow atomic and durable multi-column commits, provide repeatable read isolation, and for commits to be instantaneous. To do this, I separated transaction files from the data files. This made it possible to commit multiple columns simultaneously as a simple update of the last committed row id. I also made storage dense by removing overlapping memory pages and writing data byte by byte over page edges.<p>This new approach improved query performance. It made it easy to split data across worker threads and to optimise the CPU pipeline with prefetch. It unlocked column-based execution and additional virtual parallelism with SIMD instruction sets [2] thanks to Agner Fog’s Vector Class Library [3]. It made it possible to implement more recent innovations like our own version of Google SwissTable [4]. I published more details when we released a demo server a few weeks ago on ShowHN [5]. This demo is still available to try online with a pre-loaded dataset of 1.6 billion rows [6]. Although it was hard and discouraging at first, this rewrite turned out to be the second best thing that happened to QuestDB.<p>The best thing was that people started to contribute to the project. I am really humbled that Tanc and Nic left our previous employer to build QuestDB. A few months later, former colleagues of mine left their stable low-latency jobs at banks to join us. I take this as a huge responsibility and I don’t want to let these guys down. The amount of work ahead gives me headaches and goosebumps at the same time.<p>QuestDB is deployed in production, including into a large fintech company. We’ve been focusing on building a community to get our first users and gather as much feedback as possible.<p>Thank you for reading this story - I hope it was interesting. I would love to read your feedback on QuestDB and to answer questions.<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;peter-lawrey&#x2F;Java-Chronicle" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;peter-lawrey&#x2F;Java-Chronicle</a><p>[2] <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=22803504" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=22803504</a><p>[3] <a href="https:&#x2F;&#x2F;www.agner.org&#x2F;optimize&#x2F;vectorclass.pdf" rel="nofollow">https:&#x2F;&#x2F;www.agner.org&#x2F;optimize&#x2F;vectorclass.pdf</a><p>[4] <a href="https:&#x2F;&#x2F;github.com&#x2F;questdb&#x2F;questdb&#x2F;blob&#x2F;master&#x2F;core&#x2F;src&#x2F;main&#x2F;c&#x2F;share&#x2F;rosti.h" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;questdb&#x2F;questdb&#x2F;blob&#x2F;master&#x2F;core&#x2F;src&#x2F;main...</a><p>[5] <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=23616878" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=23616878</a><p>[6] <a href="http:&#x2F;&#x2F;try.questdb.io:9000&#x2F;" rel="nofollow">http:&#x2F;&#x2F;try.questdb.io:9000&#x2F;</a>
======
vii
mmap'd databases are really quick to implement. I implemented both row and
column orientated databases. The traders and quants loved it - and adoption
took off after we built a web interface that let you see a whole day and also
zoom into exact trades with 100ms load times for even the most heavily traded
symbols.

The benefits of mmaping and in general POSIX filesystem atomic properties are
quick implementation, where you don't have to worry about buffer management.
The filesystem and disk block remapping layer (in SSD or even HDDs now) are
radically more efficient when data are given to them in contiguous large
chunks. This is difficult to control with mmap where the OS may write out
pages at its whim. However, even using advanced Linux system calls like mremap
and fallocate, which try to improve the complexity of changing mappings and
layout in the filesystem, eventually this lack of control over buffers will
bite you.

And then when you look at it, the kernel (with help from the processor TLB)
has to maintain complex data-structures to represent the mappings and their
dirty/clean states. Accessing memory is not O(1) even when it is in RAM.
Making something better tuned to a database than the kernel page management is
a significant hurdle but that's where there are opportunities.

~~~
bluestreak
thank you for sharing! The core of memory management is abstracted away. All
of the query execution logic is unaware of the source of memory pointer. That
said we are still learning and really appreciate your feedback. There are some
places where we could not beat aggregation of julia, but the delta wasn't very
big. This could have been down to mapped memory. We will definitely try things
with direct memory too!

~~~
vii
The databases I implemented experimented with various ways to compile queries.
Turns out that the JVM can run quite fast. I feel like LLVM (Julia) is likely
to be able to be better for throughput and definitely better for
predictability of performance.

If you know layouts and sizes, then your generated code can run really fast -
using SIMD and not checking bounds is a win.

Hugepages would greatly reduce pagetable bookkeeping, but obviously may
magnify writes. Wish I could have tried that!

~~~
bluestreak
Our best performance currently is in C++ code and LLVM is something we are
considering using to compute expressions, such as predicates and select clause
items. This is most likely to be way faster than what we can currently do in
Java. What I would like to know if LLVM can optimize all the way to AVX512?

We also need to experiment with hugepages. The beauty is that if read and
write are separated - there is no issue with writes. They can still use 4k
pages!

------
shay_ker
Absolutely love the story. TimescaleDB & InfluxDB have had a lot of posts on
HN, so I'm sure others are wondering - how do we compare QuestDB to them? It
sounds like performance is a big one, but I'm curious to hear your take on it.

~~~
mpsq
As you said, performance is the main differentiator. We are orders of
magnitude faster than TimescaleDB and InfluxDB on both data ingestion and
querying. TimescaleDB relies on Postgres and has great SQL support. This is
not the case for InfluxDB and this is where QuestDB shines: we do not plan to
move away from SQL, we are very dedicated in bringing good support and some
enhancements to make sure the querying language is as flexible and efficient
as possible for our users.

~~~
shay_ker
I'm sure many folks would be really interested to see two things:

1\. A blog post around a reproducible benchmark between QuestDB, TimescaleDB,
and InfluxDB

2\. A page, like questdb.io/quest-vs-timescale, that details the differences
in side-by-side feature comparisons, kind of like this page:
[https://www.scylladb.com/lp/scylla-vs-
cassandra/](https://www.scylladb.com/lp/scylla-vs-cassandra/). Understandably,
in the early days, this page will update frequently, but that level of
transparency is really helpful to build trust with your users. Additionally,
it'll help your less technical users to understand the differences, and it
will be a sharable link for people to convince others & management that
QuestDB is a good investment.

~~~
avthar
Perhaps the QuestDB team could add it to the Time Series Benchmarking Suite
[1]? It currently supports benchmarking 9 databases including TimescaleDB and
InfluxDB.

[1] [https://github.com/timescale/tsbs](https://github.com/timescale/tsbs)

~~~
mpsq
This is a great idea, we will have a look! It is good to see that the
ecosystem is moving towards a normalized / "standard" benchmarking tool.

~~~
PeterCorless
Love it! And agree. Hopefully this has the same broad effects the same way
YCSB benchmark or Jepsen testing really helped level the playing field.

------
pachico
I see this as a very interesting project. I use ClickHouse as OLAP and I'm
very happy with it. I can tell you features that make me stick to it. If some
day QuestDB offers them, I might explore the possibility to switch but never
before. \- very fast (I guess we're aligned here) \- real time materialized
views for aggregation functions (this is absolutely a killer feature that
makes it quite pointless to be fast if you don't have it) \- data warehouse
features: I can join different data sources in one query. This allows me to
join, for instance, my MySQL/MariaDB domain dB with it and produce very
complete reports. \- Grafana plugin \- very easy to share/scale at table level
\- huge set of functions, from geo to URL, from ML to string manipulation \-
dictionaries: I can load maxdb geo dB and do real time localisation in queries
I might add some more once they come to my mind. Having said this, good job!!!

~~~
bluestreak
Thank you for the kind words and constructive feedback. We are here to build
on feedback like this. Grafana plugin is coming soon.

~~~
pachico
Glad to be useful. On the other side, I can tell you that ClickHouse also
misses a feature everyone in the community of users wish for, which is
automatic regarding when you add a new node (sort of what elasticsearch does).

And before I forget, ClickHouse Kafka Engine is simply brilliant. The
possibility of just publishing to Kafka and have your data not only inserted
in your DB but also pre-processed is very powerful.

Let me know if I can help you with use cases we have.

Cheers

~~~
santafen
Thanks for the helpful feedback! Feel free to reach out to chat more. I'm
super interested in more feedback from you. davidgs(at)questdb(dot)io

~~~
pachico
I certainly will. Cheers

------
thegreatpeter
Am I the only one that's like "wtf is a time-series database compared to a
normal one?"

~~~
avthar
This is actually an underrated question.

Time-series databases offer better performance and usability for dealing with
time-series data (think DevOps metrics, data from IoT devices, stock prices
etc, anything where you're monitoring and analyzing how things change over
time)

They allow you answer questions where time is the main component of interest
much more quickly and easily:

eg 1: IoT Sensors) Show me the average of temperature over all my devices over
the past 3 days in 15 minute intervals

eg 2: Financial data) What's the price of stock X over the past 5 years

eg 3: DevOps data) What's the average memory and CPU used by all my servers of
the past 5 mins

A normal database could be a purely relational database (e.g Postgres) or a
non-relational database (e.g MongoDB). In both these cases, while you could
use these databases for time-series data, they tend to offer worse performance
at scale and a worse experience for doing common things (e.g real-time
aggregations of data, data retention policies etc)

For more on time-series data and when you'd need a time-series database, check
out: [https://blog.timescale.com/blog/what-the-heck-is-time-
series...](https://blog.timescale.com/blog/what-the-heck-is-time-series-data-
and-why-do-i-need-a-time-series-database-dcf3b1b18563/)

~~~
airstrike
> eg 2: Financial data) What's the price of stock X over the past 5 years

This is _so incredibly frustratingly slow_ to pull on FactSet and Capital IQ,
it makes me want to pull my hair every time I have to build line charts over
time for a period greater than 2 years

~~~
fawce
plug, but our system provides very fast access to price, fundamentals,
estimates, etc:
[https://factset.quantopian.com](https://factset.quantopian.com)

------
hintymad
I'm curious how QuestDB handles dimensions. OLAP support with reasonably large
number of dimensions and cardinality in the range of at least thousands is a
must for modern-day time series database. Otherwise, what we get is only
incremental improvement to Graphite -- a darling among startups, I understand,
but a non-scalable extremely hard to use timeseries database nonetheless.

A common flaw I see in many time-series DBs is that they store one time series
per combination of dimensions. As a result, any aggregation will result in
scanning of potentially millions of time series. If any time-series DB claims
that it is backed up by a key-value store, say, Cassandra, then the DB will
have the aforementioned issue. For instance, Uber's M3 used to be backed up by
Cassandra, and therefore would give this mysterious warning that an
aggregation function exceeded the quota of 10,000 time series, even though
from user's point of view the function dealt with a single time series with a
number of dimensions.

~~~
roskilli
FYI M3 is now backed by M3DB, a distributed quorum read/write replicated time-
series based columnar store specialized for realtime metrics. You can
associate multiple values/timeseries with a single set of dimensions if you
use Protobuf's to write data, for more see the storage engine
documentation[0]. The current recommendation is not to limit your queries but
limit the global data queried per second[1] by a single DB node by using a
limit on the number of datapoints (inferred by blocks of datapoints per
series). M3DB also uses an inverted index using FST segments that are
mmap'd[2] similar to Apache Lucene and Elastic Search to make multi-
dimensional searches on very large data sets fast (hundreds of trillions of
datapoints, petabytes of data) which is a bit different to traditional
columnar databases which focus on column stores and rarely is accompanied by a
full text search inverted index.

[0]:
[https://docs.m3db.io/m3db/architecture/engine/](https://docs.m3db.io/m3db/architecture/engine/)

[1]:
[https://docs.m3db.io/operational_guide/resource_limits/](https://docs.m3db.io/operational_guide/resource_limits/)

[2]:
[https://fosdem.org/2020/schedule/event/m3db/](https://fosdem.org/2020/schedule/event/m3db/),
[https://fosdem.org/2020/schedule/event/m3db/attachments/audi...](https://fosdem.org/2020/schedule/event/m3db/attachments/audio/4032/export/events/attachments/m3db/audio/4032/FOSDEM_2020_Querying_millions_to_billions_of_metrics_with_M3DBs_inverted_index.pdf)
(PDF)

~~~
ignoramous
Recommended reading on FST for the curious:
[https://blog.burntsushi.net/transducers/](https://blog.burntsushi.net/transducers/)

~~~
roskilli
Thank you for mentioning that, Andrew's post is really fantastic covering many
things altogether: fundamentals, data structure, real world impact and
examples.

------
numlock86
>
> [https://news.ycombinator.com/item?id=22803504](https://news.ycombinator.com/item?id=22803504)

As I already said or rather asked there: Assume I already use Clickhouse for
example. What are the benefits of QuestDB? Why should I use it instead?

Surely it's a good tech and competition is key. But what are the key points
that should make me look into it? There is a lot of story about the making and
such, but I don't see the "selling point".

~~~
j1897
Hey, one of the key differences here is that Clickhouse is owned by a large
corporation, Yandex (the Google of Russia) and seems to drive its roadmap in
function of the needs of the company. We are committed to our community and
driving our roadmap based on their needs rather than having to fulfill needs
of a parent company.

Ultimately as a result we think that questDB will be a better fit for your
community. We acknowledge that Clickhouse has lot more features as of now
being a more mature product.

------
maz1b
Hi Vlad, this looks really interesting!

I really enjoyed reading the backstory and the founding dynamics upon QuestDB
was born and I think a lot of others in the YC community will as well.

Can you give some use cases or specific examples of why QuestDB is unique?

~~~
bluestreak
thanks! What differentiates us from other time series databases is the
performance. Both for ingestion and queries. For example we can ingest
application performance metrics via Influx Line Protocol and query them via
SQL and both should faster than incumbents

------
didip
I find your story very interesting, thank you for sharing that.

It also gives an interesting background as to why questdb is different than
all the other competitors in the space.

~~~
bluestreak
Thank you for the kind words!

------
judofyr
Congratulations on launching! It looks like a great product. Some technical
questions which I didn’t see answered on my first glance:

(1) Is it a single-server only, or is it possible to store data replicated as
well?

(2) I’m guessing that all the benchmarks were done with all the hot data paged
into memory (correct?); what’s the performance once you hit the disk? How much
memory do you recommend running with?

(3) How’s the durability? How often do you write to disk? How do you take
backups? Do you support streaming backups? How fast/slow/big are snapshot
backups?

~~~
bluestreak
thank you!

\- replication is in the works, this is going to be both TCP and UDP based,
column-first, very fast.

\- yes, benchmarks are indeed are done on second pass over the mmaped pages.
First pass would trigger IO, which is OS-driven and dependant on disk speed.
We've seen well over 1.5Gb/s on disks that support this speed. Columns are
mapped into memory separately and they are lazy accessed. So the memory
footprint depends on what data your SQLs actually lift. We go quite far to
minimize false disk reads by working with rowids as much and possible. For
example 'order by' will need memory for 8 x row_count bytes in most cases.

\- durability is something we want user to have control over. Under the hood
we have these commit modes:

[https://github.com/questdb/questdb/blob/master/core/src/main...](https://github.com/questdb/questdb/blob/master/core/src/main/java/io/questdb/cairo/CommitMode.java)

NOSYNC = means OS flushes memory whenever. That said, we use sliding 16MB
memory window when writing. Flushes will trigger by unmapping pages. ASYNC =
we call msync(async) SYNC = we call msync(sync)

~~~
biztos
Definitely enjoyed the story and I find the product interesting! I especially
like the time-series aggregation clauses since it makes it easy to "think in
SQL."

I was also going to ask about replication. Any idea when it's going to be
done?

Oh and kudos for the witty (previous) company name: Appsicle, haha, love that.

~~~
patrick73_uk
Hi, I'm a questdb dev working on replication, we should have something working
within a couple of months. If you have any questions feel free to ask me.

------
zumachase
Hi Vlad - your anecdote about ship tracking is interesting (my other startup
is an AIS based dry freight trader). You must know the Vortexa guys given your
BP background.

How does QuestDB differ from other timeseries/OLAP offerings? I'm not entirely
clear.

~~~
bluestreak
thank you, life is an interesting experience :) I used to work with Fabio,
Vortexa CEO and had to turn down an offer of being first employee there to
focus on QuestDB. They are an absolute awesome bunch of guys and deserve every
bit of their success!

What makes QuestDB different from other tools is the performance we aim to
offer. We are completely open on how we achieve this performance and we serve
community first and foremost.

------
jrexilius
This looks great, but more importantly good luck! There seems to be market
need for this and it looks a solid implementation at first glance. You're off
to a good start. I hope you and your team are successful!

~~~
santafen
Thanks!

------
sylvain_kerkour
Congrats!

Also thank you for your awesome blog[0]! It's really the kind of technical gem
I enjoy reading late at night :)

[0] [https://questdb.io/blog](https://questdb.io/blog)

------
aloukissas
This is great! Quick question: would you mind sharing why you went with Java
vs something perhaps more performant like all C/C++ or Rust? I'd suspect
language familiarity (which is 100% ok).

~~~
bluestreak
Java was the starting point. Back in the day Rust wasn't a thing and C++
projects were quite expensive to maintain. What Java does for us is IDE
support, instant compilation time and super easy test coverage. For things
that does require ultimate performance we do use C/C++ though. These libraries
are packaged with Java and transparent to end user.

~~~
aloukissas
Makes sense, that's what I also guessed.

------
neurostimulant
Congrats! I've been looking for a time series database but most of them seems
to be in-memory nosql databases. QuestDB might be exactly what I need. I'll
definitely give it a try soon!

~~~
pachico
It would then be in your interest to know ClickHouse. I recommend your to have
a look at it.

~~~
j1897
We've had one of their contributors bench questdb versus Clickhouse recently -
you can find the results here
[https://github.com/questdb/questdb/issues/436](https://github.com/questdb/questdb/issues/436)

This came from a bench we had on our previous website versus them about
summing 1 billion doubles.

------
pknerd
Stories like these help a product to get traction. Every founder/creator must
come up with a story related to the product.

Congrats!

------
anurag
Amazing story and congrats on all the progress!

Shameless plug: if you'd like to try it out in a production setting, we just
created a one-click install for it:

[https://github.com/render-examples/questdb](https://github.com/render-
examples/questdb)

------
airstrike
There's an opportunity for a tool that combines this sort of technology in the
backend with a spreadsheet-like GUI powered by formulas and all the user
friendliness that comes with a non-programmer interface. Wall Street would
forever be changed. Source: I'm one of the poor souls fighting my CPU and RAM
to do the same thing with Excel and non-native add-ins by {FactSet, Capital
IQ, Bloomberg}

This stuff

    
    
        SELECT * FROM balances
        LATEST BY balance_ccy, cust_id
        WHERE timestamp <= '2020-04-22T16:15:00.000Z'
        AND NOT inactive;
    

Makes me literally want to cry for knowing what is possible yet not being able
to do this on my day job :(

~~~
bluestreak
We are working on building a solid PostgreSQL support insofar as allowing ODBC
driver to execute this type of query from Excel. This is work in progress with
not that much left on it.

~~~
airstrike
Awesome! I think about this almost on a daily basis, and could very well be
wrong, but from my perspective think the killer feature is integrating the
querying with the financial data providers I mentioned above so they could
sell the whole thing as the final product to end users. (EDIT: from a reply to
another comment, it seems like some people are onto the concept:
[https://factset.quantopian.com](https://factset.quantopian.com))

If you ever install FactSet for a trial period and try querying time series
with even ~10,000+ data points, you'd be amazed at how long it takes, how
sluggish it is and how often Excel crashes.

My _real_ perspective is Microsoft should roll something similar out as part
of Excel and also get in the business of providing the financial data as they
continue the transition into services over products

------
rattray
The SQL explorer at [http://try.questdb.io:9000/](http://try.questdb.io:9000/)
is pretty slick – was that built in-house, or is it based on something that's
open-source?

~~~
bluestreak
Thanks! This is something we built in house. In fact, ‘mpsq’ did all of that.
All credit should go to him!

------
rattray
The database aside entirely, that story was a really fun read. Thanks for
writing it up and sharing. Rooting for you!

------
monstrado
I noticed there is "Clustering" mentioned under enterprise features, but I
can't seem to find any references to it in the documentation. Is this
something that will be strictly closed source?

~~~
bluestreak
There will be two different flavors of replication:

\- TCP-based replication for WAN \- UDP-based replication for LAN and high
traffic environments

We are currently building foundation elements of this replication, such as
column-first and parallel writes. These will go into and always be part of
QuestDB. TCP-replication will go on top of this foundation and also part of
QuestDB. UDP-based replication will be a part of a different product we are
building that will be named Pulsar.

~~~
monstrado
Thanks for your response! Last question...

Will the clustering target just replication (HA) or will it also target
sharding for scaling out storage capacity?

~~~
bluestreak
:)

Eventually both. We are starting with baby steps, e.g. get data from A to B
quickly and reliably. Replication/HA will be first of course. Then we want to
scale queries across multiple hosts. Since all nodes have the same data - they
may as well all participate. Sharding will be last. We are thinking of taking
a route of virtualizing tables. Each shard can be its own table and SQL
optimiser can use them as partitions of single virtual table. We already take
single table and partition it for execution. Sharding seems almost like a
natural fit.

------
gregwebs
I am still hoping to see comparisons to Victoria Metrics, which also shows
much better performance than many other TSDB. Victoria Metrics is Prometheus
compatible whereas Quest now supports Postgres compatibility. Both have
compatibility with InfluxDB.

The Victoria Metrics story is somewhat similar where someone tried using
Clickhouse for large time series data at work and was astonished at how much
faster it was. He then made a reimplementation customized for time series data
and the Prometheus ecosystem.

~~~
mpsq
This is on the roadmap, we will work on integrating with
[https://github.com/timescale/tsbs](https://github.com/timescale/tsbs), TSBS
has Victoria Metrics support too.

------
jedberg
How does your performance compare to Atlas? [0]

[0] [https://github.com/Netflix/atlas](https://github.com/Netflix/atlas)

~~~
mpsq
I have not tried to benchmark Atlas but I am not sure the result would be
meaningful. Atlas is an in-memory database, QuestDB persists to disk, the 2
are not very comparable.

~~~
jedberg
Atlas persists to disk too. Netflix stores trillions of data points in it.

It stores recent data in memory for increased performance which is replicated
across instances and then persists to S3 for long term storage, making
aggregates queryable and full resolution data available with a delay for a
restore from storage.

------
mooneater
Awesome! Could you share a bit about business model?

~~~
j1897
hi, co-founder of QuestDB here.

QuestDB is open source and therefore free for everybody to use. Another
product using QuestDB as a library with features that are typically required
for massive enterprise deployment will be distributed and sold to companies
through a fully managed solution.

Our idea is to empower developers to solve their problems with QuestDB open
source, and for those developers to then push the product within the
organisation bottom up.

------
Random_ernest
Testing out the demo:

SELECT * FROM trips WHERE tip_amount > 500 ORDER BY tip_amount DESC

Very interesting :-)

~~~
120bits
For some reason this query is taking too long to execute. Not sure if I missed
something.

~~~
santafen
When I ran it it took about 20s total.

------
posedge
Your story is very inspiring. I wish you all the best with this project.

~~~
santafen
Thanks for the kind words!

------
TheRealNGenius
Maybe I'm out of the loop, but I noticed lately that a majority of show/launch
hn posts I click on have text that is muted. I know this happens on down voted
comments, but is this saying that people are down voting the post itself?

~~~
GordonS
I've noticed this when I post "Ask HN" questions too - text seems to be muted
immediately though, regardless of votes. It's kind of confusing since, as you
say, elsewhere this means a comment has been downvoted.

------
einpoklum
1\. Does QuestDB support SQL sufficiently to run, say, the TPC-H analytics
benchmark? (not a time series

2\. If so, can you give some performance figures for a single machine and
reasonable scale factors (10 GB, 100 GB, 1000 GB)? Execution times for single
queries are even more interesting than the overall random-queries-per-hour
figure.

3\. Can you link to a widely-used benchmark for analytic workloads on time
series, which one can use to compare performance of various DBMSes? With SQL-
based queries preferably.

------
patrickaljord
Congrats on the launch!

One question, there are many open source database startups that make it easy
to scale on the cloud. However, when you look into the offering, the scaling
part is never actually open source and you end up paying for non open source
stuff just like any other proprietary database. So I guess my question is, are
you planning to go open core too or will you remain open source with some SaaS
offering? Good luck to you!

~~~
j1897
QuestDB is open source and will remain so forever. You will be able to scale
with it. Our commercial product, Pulsar, uses QuestDB as a library and will
offer enterprise integration and monitoring features, which are typically
required for massive enterprise deployment.

To answer your question, it will depend how big of a scale we are looking
into. If you are a large company running questdb throughout the organization
and need a specific feature set to do so, you will probably be looking to get
our paid offering.

~~~
nwsm
When I read Pulsar I think Apache Pulsar.

------
js4ever
[https://try.questdb.io:9000/](https://try.questdb.io:9000/) is down

~~~
bluestreak
[http://try.questdb.io:9000/](http://try.questdb.io:9000/)

------
bravura
Can you talk about some of the ideal use cases for a time series db? Versus
Postgres or a graph database.

~~~
santafen
Great question! Time series databases are a great solution for applications
that need to process streams of data. IoT is a popular use case. DevOps and
infrastructure monitoring applications as well. As has been mentioned in other
comments here, there are a lot of use cases in financial services as well.

These are all applications where you’re dealing with streams of time-stamped
data that needs to be ingested, stored, and queried in huge volumes.

------
samsk
Does it supports some kind of compression ? That's very important when storing
billions of events.

~~~
mpsq
Not yet but this is on the roadmap. In the meantime you could use a filesystem
that supports compression such as ZFS or brtfs. The data is column orientated,
this means that compression would be super efficient.

------
lpasselin
Does postgres wire support mean QuestDB can be a drop-in replacement for a
postgres database?

Is this common?

~~~
santafen
QuestDB Head of DevRel here ... Yes, it can be a replacement of Postgres and
it will be cheaper and faster. That being said, PGwire is still in alpha and
is not 100% covered yet, so while migrating is possible, 100% Postgres Wire
Protocol compatibility is not there yet.

For traditional transactional RDBMS data, I don't think it's a very common
choice. For Time Series data, QuestDB is by far the fastest choice for
Postgres-compatible SQL Time Series databases.

~~~
jaydub
Yeah, I checked it out and wanted to use but a bunch of regular old SQL
queries don't work. Please add support for the old fashioned group by syntax!
(This will be helpful for getting to a true drop-in replacement!)

~~~
joan4
QuestDB dev here. We added support for GROUP BY syntax in yesterday's release

------
fredliu
How do you get the best performance out of QuestDB? Does it have to be on bare
metal machines? Is there any performance benchmark of QuestDB running on bare
metal vs. cloud instances (e.g. EC2 with EBS volumes) etc.?

~~~
bluestreak
QuestDB is quite intensive on cores. For example it is not a good idea to put
two threads on hyperthreded cores that share the same physical core. Also it
isn't good idea to put threads on cores that belong to different physical CPUs
on multi-socket server. With bare metal box you will have visibility of all of
these conditions. On virtualized boxed - this will depend on your luck really.
That said - if no one is hammering the same core questdb is hammering -
performance will be very similar to metal box.

------
myth_drannon
[https://questdb.io/docs/crudOperations](https://questdb.io/docs/crudOperations)
Has js errors and is not loading/page not found

~~~
mpsq
Thanks for reporting this! This is an old link, please use
[https://questdb.io/docs/guide/crud](https://questdb.io/docs/guide/crud)
instead. I am currently updating the README and removing all dead links.

------
jankotek
Good luck. I work on similar OS database engine for about decade now. It is
not bad, but I think consulting is better way to get funds. Also avoid "zero
gc", JVM can be surprisingly good.

Will be in touch :)

~~~
mpsq
Thanks Jan!

------
nlitened
Do you measure performance vs k/shakti?

~~~
santafen
No one is allowed to benchmark them except them. :-)

------
dominotw
something is off with your website. I just see images
[https://questdb.io/blog/2020/07/24/use-questdb-for-
swag/](https://questdb.io/blog/2020/07/24/use-questdb-for-swag/)

~~~
mpsq
What browser are you using?

~~~
dominotw
chrome on osx Version 84.0.4147.89 (Official Build) (64-bit)

works fine in safari, something is up with the dark theme.

~~~
mpsq
I tried with the same setup and it works fine. I tried to disable JS too and
it's OK. Could it be a rogue extension?

~~~
dominotw
ah yea you are right. I had high contrast extension messing with this page.

------
wappa
How do i join the slack group? It says to request invite from the workspace
administrator?

~~~
j1897
if you click on join slack top right of our website you'll be joining our
public channels!

~~~
pupdogg
Unable to join your slack here...Slack is stating: "Contact the workspace
administrator for an invitation"

------
nmnm
Loved the story and the product!

------
jeromerousselot
Great story! Thanks for sharing

~~~
joan4
Thanks Jerome!

------
massimosgrelli
Impressive. Can we talk?

~~~
j1897
Sure could you join our slack ? Top right of our website www.questdb.io

~~~
pupdogg
Unable to join your slack here...Slack is stating: "Contact the workspace
administrator for an invitation"

~~~
santafen
Head of DevRel for QuestDB here drop me an email at davidgs(at)questdb(dot)io
and I'll make sure you get invited.

------
rbruggem
great story! well done.

------
monstrado
Any plans on integration with Apache Arrow?

~~~
bluestreak
It has been asked here:
[https://github.com/questdb/questdb/issues/261](https://github.com/questdb/questdb/issues/261).
Definitely on our road map. It would be good if you could share your story why
you need arrow?

~~~
monstrado
No urgent reason. I've noticed a decent of technologies have adopted it in
some way or another. I could imagine it being useful for integrating QuestDB
with existing internal systems which use Arrow for its in-memory/interchange
format.

Appreciate the issue link :)

------
tosh
kudos @ launching, impressive

~~~
santafen
Thank so much!

------
Maro
Can you add a tldr?

~~~
kgraves
this. i'm sure qdb is a great product, but i can't even stomach reading long
lines and walls of text...

