
“SQL queries ran up to 29 times faster on CrateDB than they did on PostgreSQL” - nslater
https://crate.io/a/benchmarking-complex-query-performance-cratedb-postgresql/
======
MarkusWinand
I'm curios if this is a fair benchmark (considering it was conducted by one of
the vendors).

So I downloaded the mentioned whitepaper to look at the indexes they created
in PostgreSQL.

The only thing mentioned in the whitepaper is this table definition:

    
    
      CREATE TABLE IF NOT EXISTS t1 (
      "uuid" VARCHAR,
      "ts" TIMESTAMP, "tenant_id" INT4, "sensor_id" VARCHAR,   
      "sensor_type" VARCHAR, "v1" INT4,
      "v2" INT4,
      "v3" FLOAT4,
      "v4" FLOAT4,
      "v5" BOOL,
      "week_generated" TIMESTAMP, "taxonomy" ltree
      );
    

The table definition in Crate is a little different:

    
    
      CREATE TABLE IF NOT EXISTS b.t2 ( "uuid" STRING,
      "ts" TIMESTAMP,
      "tenant_id" INTEGER, "sensor_id" STRING, "sensor_type" STRING,
      "v1" INTEGER,
      "v2" INTEGER,
      "v3" FLOAT,
      "v4" FLOAT,
      "v5" BOOLEAN,
      "week_generated" TIMESTAMP GENERATED ALWAYS AS     date_trunc('week', ts),
       INDEX "taxonomy" USING FULLTEXT (sensor_type) WITH (analyzer='tree')
      ) PARTITIONED BY ("week_generated") 
      CLUSTERED BY ("tenant_id") INTO 3 SHARDS;
    

I don't know anything about how Crate works, but I see PARTITIONED BY
("week_generated") and CLUSTERED BY ("tenant_id") INTO 3 SHARDS.

Now looking at the first query, both, their PARTITION BY as well as CLUSTER BY
columns appear in the query:

    
    
      SELECT min(v1) as v1_min, max(v1) as v1_max, avg(v1) as v1_avg, sum(v1) as v1_sum
      FROM b.t2
      WHERE tenant_id = ? AND week_generated BETWEEN ? AND ?;
    

Unless there is a sufficiently good index used in PostgreSQL, this doesn't
seem to be a fair benchmark.

In the Appendix they mention how the partitioning could be implemented in
PostgreSQL, also mention the indexes to be used together with the
partitioning, but also say: "Because of this, we abandoned the partitioning
approach for our benchmark, instead opting for a single PostgreSQL table with
no partition logic."

I didn't find any statement about if/which indexes they were using in
PostgreSQL when measuring their benchmarks.

They also use a rather old PostgreSQL version (9.2 — released in 2012). Now,
we have PostgreSQL 9.6 which has some parallel query execution support that
might also change this figures dramatically.

~~~
jodok
thanks for the comment, we'll update the indexes used in postgres. we tried to
be as objective as possible and of course show our sweet spot. we don't want
to treat postgres unfair, if you have suggestions on how to improve indexing
in postgres we'd be happy to take that into account. (full disclosure: i'm co-
founder of crate.io).

