
Using SIMD to aggregate billions of values per second - bluestreak
https://www.questdb.io/blog/2020/04/02/using-simd-to-aggregate-billions-of-rows-per-second
======
bluestreak
QuestDB co-founder and CTO here - happy to share questdb, a performance-driven
open-source time-series database that uses SQL.

High performance databases have a reputation of being inaccessible. They are
expensive, closed-source, and require complex proprietary languages.

We have made our code available under Apache 2.0. Under this new release,
QuestDB leverages SIMD instructions, vectorizations and parallel execution to
achieve performance figures at the top of the high-performance databases.
Results are shown in our blog post.

I sincerely hope that you will find this helpful, and that this will unlock
new possibilities! In the meantime, we will carry on working on QuestDB,
adding more features, and taking this speed to more query types. Feel free to
join us on slack or Github if you want to participate.

~~~
numlock86
Serious but bold question: What are the benefits versus Clickhouse for
example? Why should I use QuestDB?

Amazing work either way. The space of databases can never have too much
competition.

~~~
bluestreak
It is hard for me to say right now as we did not benchmark against Clickhouse
yet, this is clearly the most requested comparison. We will come back on this!

------
polskibus
It would be great if you include clickhouse in your benchmark. It also boasts
heavy SIMD use and is free + open source.

------
leeoniya
not to hijack too much, but since this is on the topic of timeseries...i'm
currently working on a fast* Canvas2D timeseries chart:

[https://github.com/leeoniya/uPlot](https://github.com/leeoniya/uPlot)

* ~4,000 pts/ms on an i5 and integrated gpu

~~~
bufferoverflow
Why canvas2d when WebGL is much faster?

~~~
leeoniya
a few reasons.

the main one is that i know canvas and can implement everything i need, but
would need to learn a lot of webgl to get any further than a basic PoC.

browsers limit how many webgl contexts you can acquire. chrome is capped at
16, so if you need more than 16 charts on a page, you're out of luck.

initializing webgl is actually slower than canvas 2d.

it's difficult to draw lines of different thicknesses in raw webgl - it takes
a lot of code.

there are projects which abstract webgl nicely to provide canvas-like
ergonomics, like Two.js, Pixi.js, or Canvas2DtoWebGL/litegl [1], but they
would make the codebase much larger and more complex.

[1]
[https://github.com/jagenjo/Canvas2DtoWebGL](https://github.com/jagenjo/Canvas2DtoWebGL)

------
SimonPStevens
Super interesting product I'll definitely be taking a deeper look at this when
I'm at work tomorrow.

I notice all your comparisons are with floating point and integer types. I was
recently looking at SIMD as a possible way to speed up some of our
calculations. But we create financial software and most data is typically
stored as decimals not floating points to avoid problems with binary floating
point precision during calculations.

Does quest handle decimals, just without the SIMD speed up?

Is this just a dead end? Are there any SIMD implementations that deal with
decimal numbers? I considered Hacky workarounds like using integer based types
internally and then treating them as as fixed point decimals for display but
that doesn't give enough range for my purposes.

~~~
bluestreak
Thanks! What sort of decimal type is it?

~~~
SimonPStevens
Well in code we're using C#'s decimal type (128 bit. 96 bits are used for an
integer and the rest used for the sign and scaling factor) [0]. It's
essentially just floating point applied to a base 10 integer rather than a
binary one.

In the SQL server database the column types are usually decimal(18,5) or
decimal(25,12) [1]

[0] - [https://docs.microsoft.com/en-
us/dotnet/api/system.decimal?v...](https://docs.microsoft.com/en-
us/dotnet/api/system.decimal?view=netframework-4.8)

[1] - [https://docs.microsoft.com/en-us/sql/t-sql/data-
types/decima...](https://docs.microsoft.com/en-us/sql/t-sql/data-
types/decimal-and-numeric-transact-sql?view=sql-server-ver15)

~~~
bluestreak
thanks! Someone else also pointed at lack of 128-bit long support. It looks as
if long128 might help here.

------
jmakov
So how does it compare to Clickhouse?

~~~
bluestreak
We have not benched against clickhouse yet. Sorry. Sounds like this should be
an interesting things to do!

~~~
missosoup
The broader question though is how does your offering compare to Clickhouse?

~~~
j1897
co-founder of questdb here - we have been asked the same question on reddit as
well. We are starting to work on an article going through a comparison between
QuestDB and Clickhouse today - this will also include a bench. Will share as
soon as we can. stay tuned!

------
zbjornson
Cool. I see you're doing a regular (not compensated) horizontal sum in a loop.
Horizontal sums are slow, but I'm guessing you wanted to have exactly the same
result as if the sum was calculated sequentially (for doubles)? Do you know if
any databases use more accurate summation methods (compensated summation)?

~~~
bluestreak
Are you referring to
[https://en.wikipedia.org/wiki/Kahan_summation_algorithm](https://en.wikipedia.org/wiki/Kahan_summation_algorithm)
by any chance?

If this is the case - we can easily implement this algorithm.

~~~
zbjornson
There are several accurate summation algorithms; Kahan is one of them. Looks
like Postgresql uses an accurate summation method[0] (and not Kahan), so
that's probably a big factor in why theirs is so much slower and also makes
for an unfair comparison. FWIW I posted AVX2 and AVX512 Kahan summation
implementations here[1].

[0]
[https://github.com/postgres/postgres/blob/3ed2005ff595d34927...](https://github.com/postgres/postgres/blob/3ed2005ff595d349276e5b2edeca1a8100b08c87/src/backend/utils/adt/float.c#L2723)

[1] [http://blog.zachbjornson.com/2019/08/11/fast-float-
summation...](http://blog.zachbjornson.com/2019/08/11/fast-float-
summation.html)

~~~
bluestreak
Thank you Zach, this is super interesting and useful! I wonder why PG do not
use Kahan?

------
quod_2058
I came across QuestDB in the past, but never tried myself. At my company, we
use kx and onetick. Could you please elaborate why you are also comparing with
Postgres since it's not really a time-series database nor revendicating to be
part of the "high performance" club?

~~~
jnordwick
because they seem to only support that, from their page:

"As of now, SIMD operations are available for non-keyed aggregation queries,
such as select sum(value) from table."

not even sure if they support where clauses on that, sums of functions of a
column, or even other things like stddev of the column.

their storage format though looks good and simple (similar to kdb actually),
but they really should have an 8-byte char instead of the 16-byte one since
that would be far more used.

their partitioning scheme is only on time so less advanced than other system.

single designated timestamp column (so no bitemporal), but do support asof
joins which is nice.

they totally screwed up on dates and time. dates only to milli and timestamps
only to micros. huge mistake.

long256 which is nice, but strangely no long128s (which wind up being nice
when you have base-10 fixed-point numbers normalized to a large number of
decimals).

i didn't see any fixed-width string/byte columns. Does have 32-bit symbols (i
assume similar to kdb?) that might cover some of those use cases.

some good and some bad in there. never going to compete with kdb or onetick on
performance (and nobody competes with kdb on query language/extensibity) , but
could find a niche based on price and having simpler more easily adapted to
querying and more human interface.

~~~
bluestreak
Good summary, thank you.

\- we will extend SIMD to where clause, keyed aggregations, sampling,
ordering, joins etc. It is a matter of time.

\- do you mind elaborating on how we screwed up date and time?

\- what makes you think we are never going to compete on performance with
kdb+?

~~~
jnordwick
\- nano are important for keeping ordering. while you may return results in
insert order it is nice to have them so any other operations done in them
outside the db you can retain (or recreate) that ordering. in financial
systems nanos have become a sort of defacto standard for this. for examsple,
all our messaging timestamps at places i've worked are always nano for
anything written in the last 5-10 years.

Also when you are trying to do calculations on high-frequency data (tick, iot)
it ruins your ability to take meaningful deltas (eg, arrival rates) since you
get a lot of 0s and 1s for the time deltas. Its difficult to take weighted
averages with weights of 0s.

the issues solving that (if you really need a wider range) are easier to solve
that having to force everything down to micros and creating ways around that.
(eg, kdb uses multiple date, time, and timestamp types and it doesn't use the
unix epoch since it isn't very useful for tick, censor, or any high-frequency
data i've seen).

better than a double that some systems still use.

-kdb's secret sauce that people don't seem to understand is its query language that more naturally fits the time series domain. It isn't really a database as more it is an array language with a database component. (eg, try to write an efficient sql query that calculates 5 minute bars on tick data).

I actually like Java too - I've written or worked on a couple trading systems
written in core java. just get good devs who understand what it means to write
zero gc code, abuse off-heap memory, and understand what hotspot can
intrinsify. If you can stay in the simd code for all the heavy lifting loops
(filters, aggregates, etc), I don't think java will be an impediment.

I think you have parts going in the correct direction, and you seem to have
good experience from looking at the bios. Nothing really un-fixable (or un-
addable) in what I saw glancing at your docs. I did bookmark you to see how
the db goes. Will prob check out soon.

~~~
patrick73_uk
I'm a QuestDB dev, with regards to nano timestamps, we dont use a nanosecond
timestamp because its not possible to be accurate to that resolution with
current hardware. However, on a single host the nano second clocks are precise
and monotonic, they would be useful to maintain order. I think they do make
sense and we will have to look into providing timestamps to that resolution.

~~~
salmonlogs
"its not possible to be accurate to that resolution with current hardware"

Are you referring to the clock precision of consumer grade hardware here?

In my experience the vast majority of financial time series data is reported
in nanoseconds. The data providers, vendors, exchanges and data brokers
absolutely have hardware capable of measuring timestamps in nanoseconds.

The accuracy doesn't have to be to 1ns of resolution to warrant measuring in
nanos - even to the nearest 100ns is a useful and meaningful improvement
beyond micros.

~~~
bluestreak
We are going to add a new type in the future to support nanos! Sorry for the
confusion.

~~~
jnordwick
can you please contact me to jnordwick@gmail.com?

------
binomiq
Were these benchmarks before or after 2020.03.26? There was a bug that caused
max operations to take twice as long.

From the KDB+ 4.0 release notes:

2020.03.26 FIX fixed performance regression for max. e.g.
q)x:100000000?100;system"ts:10 max x"

~~~
bluestreak
these are against the latest 4.0 KDB+, after 26 March. KDB before that could
not aggregate in parallel implicitly.

~~~
binomiq
In your benchmarks, KDB's max on longs takes twice as long as sum on longs. I
am not able to replicate this with the most recent version of KDB
(2020.03.30).

    
    
        $QHOME/l64/q -s 4
        KDB+ 4.0 2020.03.30 Copyright (C) 1993-2020 Kx Systems
        l64/ 4()core 516718MB <snip>
    
        q)zz:1000000000?1000j
        q)0.01*system"t do[100;max zz]"
        257.52
        q)0.01*system"t do[100;sum zz]"
        251.95
    

There's a marginal difference here rather than double.

For the version of KDB with the regression bug:

    
    
        q)0.01*system"t do[100;max zz]"
        512.21
        q)0.01*system"t do[100;sum zz]"
        254.24
    

Which starts to look more like your numbers.

~~~
bluestreak
sorry, i think we tested with this one:

    
    
        KDB+ 4.0 2020.03.17 Copyright (C) 1993-2020 Kx Systems
        l64/ 12(16)core 63960MB
        EXPIRE 2021.03.26

~~~
dintech
Even without that 2x, all benchmarks are still in the same ballpark which is
impressive in its own right. It will be interesting to see where this goes.

------
sirffuzzylogik
The numbers are impressive, especially because it is against kdb. q/kdb is
mostly finance focused and closed source so not really flexible. questdb has
an advantage on this, it might be a bit of a tangent but I wonder if this
could be used to replace redis, I can see how having SQL as a querying
language could be a big plus.

~~~
bluestreak
it would depend on the use case of course. QuestDB goes a bit further in that
it provides speed close to in-memory loads while actually storing the data on
disk. If there is enough interest, we could consider releasing an "in-memory
only" mode for these use cases.

------
stereosteve
In the reference there's no mention of SQL Window functions. Is it possible to
do multiple moving averages over different time spans?

If not, are there plans to add support in the future?

~~~
TheTank
Window functions are in draft, we will release them imminently. We will
support moving averages. In fact we plan to support generic multi-pass and
window functions. Having multi-pass will allow you to do things like `select
sum(x -sum(x)) from tab`.

~~~
jnordwick
those things are so useless if your time points arent equidistant (ie, you
don't care about the last 100 rows, you care about the last 5 minutes). they
basically force you to use very slow correlated subqueries. please do
something better.

------
jeromerousselot
Great work! 2.3x faster than kdb and 500x than postgres for sum (double) time
series What are your goals for the next release?

~~~
bluestreak
thank you! Next release will feature similar acceleration of group-by and time
sampling queries.

------
popotamonga
I assume the values must be all be in memory beforehand and not hard storage.

~~~
bluestreak
That sum() SQL is all you do as a user. There is no additional magic there.

The performance numbers are indeed best when data is in memory. However in
reality sum() goes over memory mapped file, lazy loading data as required.

~~~
heartbeats
So how would this compare to the naïve approach of using a cursor in another
fast, memory-mapped database, like LMDB?

------
continuations
What about data compression? How does that compare to other time series DBs?

~~~
bluestreak
There is no dedicated data compression at the moment as our use-case is
centered around performance. That being said, we store data very efficiently
with minimal overhead.

One interesting consequence is that in many cases, we don't require indexes
where other databases do. This synthetically compress the data relative to
other DBs by removing the space taken by these indexes while improving query
speed. It also means ingestion speed remains fast with O(1) complexity.

~~~
continuations
> we store data very efficiently with minimal overhead.

Do you use encoding like delta-of-delta timestamps or something similar?

> One interesting consequence is that in many cases, we don't require indexes
> where other databases do.

I don't follow. Why don't you require indexes where other databases do?

Thanks.

~~~
bluestreak
We don't store deltas, timestamps are stored as 64-bit int back to back in
column. So are other primitives actually.

To avoid using indexes we store timestamps in ascending order. Timestamp
interval search uses partitioning to cull chunks of data before lifting data
in memory. Once in memory we use binary search on ordered timestamps to find
intervals.

That said we do support indexes for key-based lookups

------
georgewfraser
What dialect of SQL are you using?

~~~
bluestreak
We try to stick to ANSI SQL as best as we can, but we have own own dialect.
This should be a good entry point:
[https://www.questdb.io/docs/select](https://www.questdb.io/docs/select)

~~~
georgewfraser
Why your own dialect instead of Calcite SQL or ZetaSQL?

~~~
bluestreak
We support postgres wire protocol, so we are going to be leaning towards
PostgreSQL dialect to allow existing PostgreSQL infrastructure connect to us.

------
sch00lb0y
is it difficult to add SIMD to the existing time-series database?

~~~
jnordwick
columnar, probably. row-based not possible.

------
johndoe42377
This can't be called a database due to lacking of any persistent storage. It
does not survive a system crash.

It is a structured in-memory (or rather in-JVM) cache with a rudimentary SQL
interface.

Calling things by its proper names is a half-way to intelligence.

