Hacker News new | past | comments | ask | show | jobs | submit login

I was curious so I tried it in postgres 13. Postgres, at least, uses the index to form a bitmap and scans that when aggregating in the first case (10% rows in the bitmap) and not in a second case WHERE "createdAt" BETWEEN '1990-01-01 00:00:00' AND '2020-12-31 23:59:59'; (100% rows in the bitmap, obviating the need for the intermediate step). I also tried ~20% rows (2019-2020) and the planner skipped the index. '''

CREATE TABLE temp (id SERIAL PRIMARY KEY, amount MONEY, "createdAt" TIMESTAMPTZ); CREATE INDEX ON temp ("createdAt");

INSERT INTO temp(id, "createdAt", amount) SELECT generate_series(1,1000000) AS id, NOW() + (random() * (interval '10 years')) - interval '10 years' AS createdAt, random() * 100::money AS amount.

EXPLAIN SELECT sum(amount) FROM temp WHERE "createdAt" BETWEEN '2020-01-01 00:00:00' AND '2020-12-31 23:59:59';

Aggregate (cost=10286.06..10286.07 rows=1 width=8) -> Bitmap Heap Scan on temp (cost=2148.00..10033.48 rows=101032 width=8) Recheck Cond: (("createdAt" >= '2020-01-01 00:00:00-05'::timestamp with time zone) AND ("createdAt" <= '2020-12-31 23:59:59-05'::timestamp with time zone)) -> Bitmap Index Scan on "temp_createdAt_idx" (cost=0.00..2122.75 rows=101032 width=0) Index Cond: (("createdAt" >= '2020-01-01 00:00:00-05'::timestamp with time zone) AND ("createdAt" <= '2020-12-31 23:59:59-05'::timestamp with time zone))

And when running a longer query: Finalize Aggregate (cost=14596.71..14596.72 rows=1 width=8) -> Gather (cost=14596.49..14596.70 rows=2 width=8) Workers Planned: 2 -> Partial Aggregate (cost=13596.49..13596.50 rows=1 width=8) -> Parallel Seq Scan on temp (cost=0.00..12620.00 rows=390597 width=8) Filter: (("createdAt" >= '1990-01-01 00:00:00-05'::timestamp with time zone) AND ("createdAt" <= '2020-12-31 23:59:59-05'::timestamp with time zone))




Buffer pool plays a big part. Very possible all the data is already in-memory, and for certain data sizes it'll be faster to just follow leaf nodes start-to-finish than it is to determine what pages you can skip.

Postgres buffer pool is a ring, and relies on "clock sweep" to decide what pages it can evict on each iteration. It has a shared buffer, and per-query buffers to eliminate shared buffer evictions (for costly queries). When doing index scans, worst-case the same page is being accessed in random order multiple times and it's evicted between those accesses so we end up with redundant disk I/O.

Bitmap scans ensure each page is only scanned once and in-order, so it's a great solution when you need more than an index scan but less than a full table scan (worth of data), not to mention multiple indexes can be combined into one bitmap scan.

If every page is already in memory, the query planner may pick plans that look sub-optimal if you factor in disk I/O but are otherwise very efficient in-memory.


FWIW you can indent with 4 spaces for a preformatted block e.g.

    CREATE TABLE temp (id SERIAL PRIMARY KEY, amount MONEY, "createdAt" TIMESTAMPTZ); CREATE INDEX ON temp ("createdAt");
    INSERT INTO temp(id, "createdAt", amount) SELECT generate_series(1,1000000) AS id, NOW() + (random() * (interval '10 years')) - interval '10 years' AS createdAt, random() * 100::money AS amount;




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: