> It’s probably also fast because you have a warm cache - e.g. there’s enough memory for the DB to have the indexes 100% in memory, which is just not feasible with large DBs in the real world, where you can easily have >100GB of indexes + hot data, and the DB can’t keep it all in memory.
That's the point? Sure there is a scale where it's infeasible, but you can quite easily (albeit it's pricey) get DB instances with hundreds or thousands of GiB of RAM. Even if you can't get everything into it, your working set is often not the size of the entire data set. My company's main MySQL primary has around 800 GiB of storage, and only about 1/3 of that in RAM. Disk read IOPS are usually 0.
Nevertheless, I recreated this in Postgres 15.
The total index size of the two tables is 790 MB according to `\di+*`, so I'll set `shared_buffers` to 263 MB, then restart and re-run.
For reference, time with current settings is about 6.8 msec for `is_primary`, and 1395 msec for `NOT is_primary`.
After a restart, I ran the query for `NOT is_primary` first, which took 1740 msec. The first run of the query with `is_primary` took 21 msec.
My DB is hosted on older hardware, and the files themselves live on NVMe drives presented via Ceph over 1GBe.
EDIT: I forgot that Postgres uses OS cache for a lot of its memory, not just its own. Re-did this, running `sync; sync; sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'` in between shutting down/starting up Postgres. 16726 msec and 79 msec, respectively. So yes, a lot slower, but a. I don't think this is realistic for a production database b. I'm still not clear about how you think JOINs enter into this. The slowdown comes entirely from having to run a table scan.
First off, ty for running all these benchmarks, above and beyond!
FWIW, I don’t think joins are bad, I’m 100% for normalized DB schemas with joins. But I’ve done tonnes of performance work over the past ~10 years, and run into a bunch of real world cases where, when caches are cold (which does happen frequently with large datasets and limited budgets), queries similar to the above (join two tables, read a page of data, sorting on one table and filtering on the other) take 10s of seconds, sometimes even minutes with very large datasets. In those cases, the only solution has been to denormalize so we can create a compound index, with filter key(s) as the prefix, sort key as the suffix, which makes the query consistently fast. I am not at all suggesting this as the default, better to default to normalized with joins, and only do this for these specific cases. Generally a company just has one or a few cases where this is necessary, can just do the ugly denormalization + compound indexes for these few hot spots. But when ppl say “joins are slow”, these cases are examples of where it’s true.
Re: your above 17 second query, if you do something like adding fields to the user table for primary and secondary contact method (denormalizing), and then create a compound index with the necessary filter keys first, and the sort key second, I think you’ll find the query (which no longer needs a join) is quite fast even if caches are ice cold.
Created a new table that contains `user_id, created_at, phone_primary, phone_secondary`. Inserted all 10,200,000 rows. Notably (I'll come back to this) due to the generation of the rows, the primary key (`user_id`) is an unsorted integer - this was _not_ done with a serial or identity.
postgres=# CREATE INDEX sec_phone_created_at ON hn_phone_new (phone_secondary, created_at) WHERE phone_secondary IS NOT NULL;
I reset `shared_buffers` down to the same as before - 263 MB - although the size of this index is tiny, < 10 MB, so realistically I can't shrink buffers down that far anyway. I then did the same `sync/drop cache` as before.
So yes, significant improvement as you'd expect. I then dropped the index and swapped the order:
postgres=# DROP INDEX sec_phone_created_at;
postgres=# CREATE INDEX created_at_sec_phone ON hn_phone_new (created_at, phone_secondary) WHERE phone_secondary IS NOT NULL;
Reset everything as before, and re-ran the same query:
Time: 221.392 ms
Thinking that like MySQL, a portion of the `shared_buffers` had been saved to disk and put back in upon restart (honestly I don't know if Postgres does this), I attempted to flush it by running a few `SELECT COUNT(*)` on other, larger tables, then re-running the query.
Time: 365.961 ms
This is what `EXPLAIN VERBOSE` looks like for the original index:
Yeah, I believe which order is best (sortKey/filterKey or filterKey/sortKey) really depends on the specific data/queries, best to try both and pick the best one - looks like sortKey/filterKey in this case :)
But I think this does show how, for specific data/queries, sometimes you do have to denormalize, so that you can create the ideal compound index, for specific problem queries. Should still go with normalized schemas and joins as a default, but if problem queries pop up like this that are taking 10, 20, 30 seconds sometimes (when caches are cold), compromising a bit on clean schemas/code for performance makes sense.
If you’re limited in RAM and can’t upsize, then yes, this does appear to be a good trade off. You can always refactor later and normalize if necessary.
BTW, although it wouldn’t have helped for your specific benchmark schema creation of TYPES, I’ll plug my genSQL tool [0] for generating random data. It’s primarily designed around MySQL, but it can produce CSVs easily, which every DB can load.
Turns out a lot of random() calls in most languages is slow af, so mine avoids that by (mostly) batching them in a C library. Should be able to create a million somethings in under 10 seconds on modern hardware in Python 3.11.
That's the point? Sure there is a scale where it's infeasible, but you can quite easily (albeit it's pricey) get DB instances with hundreds or thousands of GiB of RAM. Even if you can't get everything into it, your working set is often not the size of the entire data set. My company's main MySQL primary has around 800 GiB of storage, and only about 1/3 of that in RAM. Disk read IOPS are usually 0.
Nevertheless, I recreated this in Postgres 15.
The total index size of the two tables is 790 MB according to `\di+*`, so I'll set `shared_buffers` to 263 MB, then restart and re-run.
For reference, time with current settings is about 6.8 msec for `is_primary`, and 1395 msec for `NOT is_primary`.
After a restart, I ran the query for `NOT is_primary` first, which took 1740 msec. The first run of the query with `is_primary` took 21 msec.
My DB is hosted on older hardware, and the files themselves live on NVMe drives presented via Ceph over 1GBe.
EDIT: I forgot that Postgres uses OS cache for a lot of its memory, not just its own. Re-did this, running `sync; sync; sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'` in between shutting down/starting up Postgres. 16726 msec and 79 msec, respectively. So yes, a lot slower, but a. I don't think this is realistic for a production database b. I'm still not clear about how you think JOINs enter into this. The slowdown comes entirely from having to run a table scan.