Hacker News new | past | comments | ask | show | jobs | submit login
Using SIMD to aggregate billions of values per second (questdb.io)
171 points by bluestreak 56 days ago | hide | past | web | favorite | 70 comments

QuestDB co-founder and CTO here - happy to share questdb, a performance-driven open-source time-series database that uses SQL.

High performance databases have a reputation of being inaccessible. They are expensive, closed-source, and require complex proprietary languages.

We have made our code available under Apache 2.0. Under this new release, QuestDB leverages SIMD instructions, vectorizations and parallel execution to achieve performance figures at the top of the high-performance databases. Results are shown in our blog post.

I sincerely hope that you will find this helpful, and that this will unlock new possibilities! In the meantime, we will carry on working on QuestDB, adding more features, and taking this speed to more query types. Feel free to join us on slack or Github if you want to participate.

I'll have to be careful here as I've no experience either with your DB or Postgres, but comparing questDB with PG and using that to claim it's 100X faster may be technically true but a bit of a dirty trick. It's perhaps like comparing a lorry with an F1 car and being surprised at the winner.

I'm also a little surprised at the speedup using SIMD. If it's RAM constrained, and it clearly is because it scales with the number of memory channels, then I completely fail to understand how instructions - which are bloody fast to execute (SIMD or not) compared to slow RAM access - could benefit you. Both SIMD or not should be waiting on memory AFAICS. Except you clearly do get a big speedup, could anyone enlighten me on this?

Serious but bold question: What are the benefits versus Clickhouse for example? Why should I use QuestDB?

Amazing work either way. The space of databases can never have too much competition.

It is hard for me to say right now as we did not benchmark against Clickhouse yet, this is clearly the most requested comparison. We will come back on this!

This looks super interesting!

1) How does this compare to BlazingDB/MapD/etc, especially in queries on <some benchmark> per $ per hour on AWS?

2) I don't see a way to give you money. What ensures that QuestDB will be around in the future if I architect my data pipelines around this product?

Thank you!

1) We didn't compare directly, but our aim is to squeeze as much performance out of hardware as possible to minimize cloud cost. If we are not the best at this right now - we are going to be the best.

2) We are VC funded with 24 months runway. The product is Open Source Apache 2.0 and will remain as such forever. Any future commercial offering will use open source product as a library.

On point 2, I hope you have a good monetisation strategy. I’m a bit wary of new open source databases that don’t have a solid strategy for when their funding runs out, too many times have projects been more or less abandoned after funding ran out, despite being open source (eg Rethinkdb), so I’m rather wary of relying on something until it’s been well established. (Although even basho eventually shut down despite being well established...)

Thank you for advancing the state of the art! Do you consider offloading some code to the gpu at some point?

JITs are really good at doing speculative optimization then deoptimizing if the optimization failed. Couldn't DBs speculatively offload to the gpu (with coherent memory (HSA, etc) access) at least under some constraints/patterns?

Thank you for your kind comment! We would consider offloading to a GPU when the time is right. It feels right to fully utilise CPU capabilities before jumping to GPU.

Just wanted to point out a small typo in the docs: "LASTEST BY" (https://www.questdb.io/docs/crudOperations)

thanks a lot! fixed

I'm curious what your business model is. Don't you worry that releasing your product for free will cut into your revenue?

In the future, we will release other products with features geared towards enterprises. We don't think that having our code open will harm the business.

On the contrary, it will hopefully help us gain some popularity by allowing any developer to access latency levels only seen in trading. We hope that QuestDB will allow them to stop worrying about database performance bottlenecks.

It would be great if you include clickhouse in your benchmark. It also boasts heavy SIMD use and is free + open source.

not to hijack too much, but since this is on the topic of timeseries...i'm currently working on a fast* Canvas2D timeseries chart:


* ~4,000 pts/ms on an i5 and integrated gpu

This is super interesting and quite relevant to a bunch of stuff I do at work.

Next time I have a project that involves real-time plotting of data, I'll definitely take a look!

Why canvas2d when WebGL is much faster?

a few reasons.

the main one is that i know canvas and can implement everything i need, but would need to learn a lot of webgl to get any further than a basic PoC.

browsers limit how many webgl contexts you can acquire. chrome is capped at 16, so if you need more than 16 charts on a page, you're out of luck.

initializing webgl is actually slower than canvas 2d.

it's difficult to draw lines of different thicknesses in raw webgl - it takes a lot of code.

there are projects which abstract webgl nicely to provide canvas-like ergonomics, like Two.js, Pixi.js, or Canvas2DtoWebGL/litegl [1], but they would make the codebase much larger and more complex.

[1] https://github.com/jagenjo/Canvas2DtoWebGL

Super interesting product I'll definitely be taking a deeper look at this when I'm at work tomorrow.

I notice all your comparisons are with floating point and integer types. I was recently looking at SIMD as a possible way to speed up some of our calculations. But we create financial software and most data is typically stored as decimals not floating points to avoid problems with binary floating point precision during calculations.

Does quest handle decimals, just without the SIMD speed up?

Is this just a dead end? Are there any SIMD implementations that deal with decimal numbers? I considered Hacky workarounds like using integer based types internally and then treating them as as fixed point decimals for display but that doesn't give enough range for my purposes.

Can't you just use a binary int with the few bits being an implicit fraction (or just count in 'pennies' AKA the lowest denominction of that currency). Seems a straightforward solution, no?

Thanks! What sort of decimal type is it?

Well in code we're using C#'s decimal type (128 bit. 96 bits are used for an integer and the rest used for the sign and scaling factor) [0]. It's essentially just floating point applied to a base 10 integer rather than a binary one.

In the SQL server database the column types are usually decimal(18,5) or decimal(25,12) [1]

[0] - https://docs.microsoft.com/en-us/dotnet/api/system.decimal?v...

[1] - https://docs.microsoft.com/en-us/sql/t-sql/data-types/decima...

thanks! Someone else also pointed at lack of 128-bit long support. It looks as if long128 might help here.

So how does it compare to Clickhouse?

We have not benched against clickhouse yet. Sorry. Sounds like this should be an interesting things to do!

The broader question though is how does your offering compare to Clickhouse?

co-founder of questdb here - we have been asked the same question on reddit as well. We are starting to work on an article going through a comparison between QuestDB and Clickhouse today - this will also include a bench. Will share as soon as we can. stay tuned!

Cool. I see you're doing a regular (not compensated) horizontal sum in a loop. Horizontal sums are slow, but I'm guessing you wanted to have exactly the same result as if the sum was calculated sequentially (for doubles)? Do you know if any databases use more accurate summation methods (compensated summation)?

Are you referring to https://en.wikipedia.org/wiki/Kahan_summation_algorithm by any chance?

If this is the case - we can easily implement this algorithm.

There are several accurate summation algorithms; Kahan is one of them. Looks like Postgresql uses an accurate summation method[0] (and not Kahan), so that's probably a big factor in why theirs is so much slower and also makes for an unfair comparison. FWIW I posted AVX2 and AVX512 Kahan summation implementations here[1].

[0] https://github.com/postgres/postgres/blob/3ed2005ff595d34927...

[1] http://blog.zachbjornson.com/2019/08/11/fast-float-summation...

Thank you Zach, this is super interesting and useful! I wonder why PG do not use Kahan?

I came across QuestDB in the past, but never tried myself. At my company, we use kx and onetick. Could you please elaborate why you are also comparing with Postgres since it's not really a time-series database nor revendicating to be part of the "high performance" club?

because they seem to only support that, from their page:

"As of now, SIMD operations are available for non-keyed aggregation queries, such as select sum(value) from table."

not even sure if they support where clauses on that, sums of functions of a column, or even other things like stddev of the column.

their storage format though looks good and simple (similar to kdb actually), but they really should have an 8-byte char instead of the 16-byte one since that would be far more used.

their partitioning scheme is only on time so less advanced than other system.

single designated timestamp column (so no bitemporal), but do support asof joins which is nice.

they totally screwed up on dates and time. dates only to milli and timestamps only to micros. huge mistake.

long256 which is nice, but strangely no long128s (which wind up being nice when you have base-10 fixed-point numbers normalized to a large number of decimals).

i didn't see any fixed-width string/byte columns. Does have 32-bit symbols (i assume similar to kdb?) that might cover some of those use cases.

some good and some bad in there. never going to compete with kdb or onetick on performance (and nobody competes with kdb on query language/extensibity) , but could find a niche based on price and having simpler more easily adapted to querying and more human interface.

Good summary, thank you.

- we will extend SIMD to where clause, keyed aggregations, sampling, ordering, joins etc. It is a matter of time.

- do you mind elaborating on how we screwed up date and time?

- what makes you think we are never going to compete on performance with kdb+?

- nano are important for keeping ordering. while you may return results in insert order it is nice to have them so any other operations done in them outside the db you can retain (or recreate) that ordering. in financial systems nanos have become a sort of defacto standard for this. for examsple, all our messaging timestamps at places i've worked are always nano for anything written in the last 5-10 years.

Also when you are trying to do calculations on high-frequency data (tick, iot) it ruins your ability to take meaningful deltas (eg, arrival rates) since you get a lot of 0s and 1s for the time deltas. Its difficult to take weighted averages with weights of 0s.

the issues solving that (if you really need a wider range) are easier to solve that having to force everything down to micros and creating ways around that. (eg, kdb uses multiple date, time, and timestamp types and it doesn't use the unix epoch since it isn't very useful for tick, censor, or any high-frequency data i've seen).

better than a double that some systems still use.

-kdb's secret sauce that people don't seem to understand is its query language that more naturally fits the time series domain. It isn't really a database as more it is an array language with a database component. (eg, try to write an efficient sql query that calculates 5 minute bars on tick data).

I actually like Java too - I've written or worked on a couple trading systems written in core java. just get good devs who understand what it means to write zero gc code, abuse off-heap memory, and understand what hotspot can intrinsify. If you can stay in the simd code for all the heavy lifting loops (filters, aggregates, etc), I don't think java will be an impediment.

I think you have parts going in the correct direction, and you seem to have good experience from looking at the bios. Nothing really un-fixable (or un-addable) in what I saw glancing at your docs. I did bookmark you to see how the db goes. Will prob check out soon.

I'm a QuestDB dev, with regards to nano timestamps, we dont use a nanosecond timestamp because its not possible to be accurate to that resolution with current hardware. However, on a single host the nano second clocks are precise and monotonic, they would be useful to maintain order. I think they do make sense and we will have to look into providing timestamps to that resolution.

"its not possible to be accurate to that resolution with current hardware"

Are you referring to the clock precision of consumer grade hardware here?

In my experience the vast majority of financial time series data is reported in nanoseconds. The data providers, vendors, exchanges and data brokers absolutely have hardware capable of measuring timestamps in nanoseconds.

The accuracy doesn't have to be to 1ns of resolution to warrant measuring in nanos - even to the nearest 100ns is a useful and meaningful improvement beyond micros.

We are going to add a new type in the future to support nanos! Sorry for the confusion.

can you please contact me to jnordwick@gmail.com?

KDB works around this by storing the nanos in 64 bits but only for a particular time range.

1707.09.22D00:12:43.145224194 (max negative) to 2292.04.10D23:47:16.854775806 (max positive)

With 0 at 2000.01.01D00:00:00.000000000

The reason why you cannot ever compete with kdb+/q is because the database and language run in one address space. Your benchmark gets around this problem by using the built in sum() function, but kdb+/q can just execute arbitrary code and never suffer a performance penalty. Unless you plan on integrated a high performance programming language into your DB, it simply will not be possible to ever meaningfully compete on the effective total time of complex queries.

I, of course, am not disparaging your work, the performance numbers are very impressive!

I'm a QuestDB dev, data on a QuestDb is also stored in a single address space and SQL queries are compiled into objects that run in that address space. However, it is just SQL not a bespoke language. A future possibility would be to allow queries in java or scala.

Implementing the ability for QuestDB to dynamically load jars would be really cool. And if you exposed an interface to directly communicate with the Db, you could get rid of the SQL parsing overhead as well. This would also allow QuestDB to function as an app engine of sorts, just like kdb+/q. I see real value in that for latency sensitive financial applications.

You are right, PostgreSQL is not necessarily optimal for time-series workloads. In this case, the benchmark is a simple task, and not related to time-series. It's consists of reading 1 billion values to a table and sum them together. It doesn't get simpler than that.

The reason we are showcasing this instead of other more complex queries is because this is a simple, easily reproducible benchmark. It provides point of reference for performance figures.

Were these benchmarks before or after 2020.03.26? There was a bug that caused max operations to take twice as long.

From the KDB+ 4.0 release notes:

2020.03.26 FIX fixed performance regression for max. e.g. q)x:100000000?100;system"ts:10 max x"

these are against the latest 4.0 KDB+, after 26 March. KDB before that could not aggregate in parallel implicitly.

In your benchmarks, KDB's max on longs takes twice as long as sum on longs. I am not able to replicate this with the most recent version of KDB (2020.03.30).

    $QHOME/l64/q -s 4
    KDB+ 4.0 2020.03.30 Copyright (C) 1993-2020 Kx Systems
    l64/ 4()core 516718MB <snip>

    q)0.01*system"t do[100;max zz]"
    q)0.01*system"t do[100;sum zz]"
There's a marginal difference here rather than double.

For the version of KDB with the regression bug:

    q)0.01*system"t do[100;max zz]"
    q)0.01*system"t do[100;sum zz]"
Which starts to look more like your numbers.

sorry, i think we tested with this one:

    KDB+ 4.0 2020.03.17 Copyright (C) 1993-2020 Kx Systems
    l64/ 12(16)core 63960MB
    EXPIRE 2021.03.26

Even without that 2x, all benchmarks are still in the same ballpark which is impressive in its own right. It will be interesting to see where this goes.

The numbers are impressive, especially because it is against kdb. q/kdb is mostly finance focused and closed source so not really flexible. questdb has an advantage on this, it might be a bit of a tangent but I wonder if this could be used to replace redis, I can see how having SQL as a querying language could be a big plus.

it would depend on the use case of course. QuestDB goes a bit further in that it provides speed close to in-memory loads while actually storing the data on disk. If there is enough interest, we could consider releasing an "in-memory only" mode for these use cases.

In the reference there's no mention of SQL Window functions. Is it possible to do multiple moving averages over different time spans?

If not, are there plans to add support in the future?

Window functions are in draft, we will release them imminently. We will support moving averages. In fact we plan to support generic multi-pass and window functions. Having multi-pass will allow you to do things like `select sum(x -sum(x)) from tab`.

those things are so useless if your time points arent equidistant (ie, you don't care about the last 100 rows, you care about the last 5 minutes). they basically force you to use very slow correlated subqueries. please do something better.

Great work! 2.3x faster than kdb and 500x than postgres for sum (double) time series What are your goals for the next release?

thank you! Next release will feature similar acceleration of group-by and time sampling queries.

I assume the values must be all be in memory beforehand and not hard storage.

That sum() SQL is all you do as a user. There is no additional magic there.

The performance numbers are indeed best when data is in memory. However in reality sum() goes over memory mapped file, lazy loading data as required.

So how would this compare to the naïve approach of using a cursor in another fast, memory-mapped database, like LMDB?

What about data compression? How does that compare to other time series DBs?

There is no dedicated data compression at the moment as our use-case is centered around performance. That being said, we store data very efficiently with minimal overhead.

One interesting consequence is that in many cases, we don't require indexes where other databases do. This synthetically compress the data relative to other DBs by removing the space taken by these indexes while improving query speed. It also means ingestion speed remains fast with O(1) complexity.

> we store data very efficiently with minimal overhead.

Do you use encoding like delta-of-delta timestamps or something similar?

> One interesting consequence is that in many cases, we don't require indexes where other databases do.

I don't follow. Why don't you require indexes where other databases do?


We don't store deltas, timestamps are stored as 64-bit int back to back in column. So are other primitives actually.

To avoid using indexes we store timestamps in ascending order. Timestamp interval search uses partitioning to cull chunks of data before lifting data in memory. Once in memory we use binary search on ordered timestamps to find intervals.

That said we do support indexes for key-based lookups

What dialect of SQL are you using?

We try to stick to ANSI SQL as best as we can, but we have own own dialect. This should be a good entry point: https://www.questdb.io/docs/select

Why your own dialect instead of Calcite SQL or ZetaSQL?

We support postgres wire protocol, so we are going to be leaning towards PostgreSQL dialect to allow existing PostgreSQL infrastructure connect to us.

is it difficult to add SIMD to the existing time-series database?

columnar, probably. row-based not possible.

This can't be called a database due to lacking of any persistent storage. It does not survive a system crash.

It is a structured in-memory (or rather in-JVM) cache with a rudimentary SQL interface.

Calling things by its proper names is a half-way to intelligence.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact