
PostgreSQL Count(*) Performance Improvements - turrini
https://www.cybertec-postgresql.com/en/count-made-fast/
======
shay_ker
The headline makes it sound like they did work to _improve_ the performance of
count(*) in the postgres code, when that's not at all what the article is
about.

This post from Citus is far more informative:

[https://www.citusdata.com/blog/2016/10/12/count-
performance/](https://www.citusdata.com/blog/2016/10/12/count-performance/)

~~~
pkmishra
+1. Citusdata article is much better.

------
BenMorganIO
Given the title, I expected this to be about how PG improved their counting.
This is not what it was about.

I remember working over half a billion records and having problems when I
needed a count. I used count(id) but that was mainly from internet mantra. I
did not see an improvement. Using Citus gave me a significant improvement from
7 minutes to 1. And that was just a single coordinator, two workers on the
same host. It could become much much better.

If the data is very stagnant and writes are very low the triggers are great.
Usually the "close enough" with pages is good if you have over 100k since
paging - please correct me if I'm wrong - is sometimes 1k off.

My preference is Citus as a catch all, but a trigger, a Redis cache managed at
the app level, or using page counts are all . really useful for stickier
situations.

~~~
groestl
As an alternative to page counts, we used HLLs to estimate (unique)
cardinality, and were quite happy with it. There is a postgres extension
(postgresql-hll) and also a version for the JVM using the same algorithm/data
format.

------
noncoml
Sorry for being a bit off topic, but anyone out there who is using PostgreSQL
in production, how do you manage tables with lots of updates? Is auto-vacuum
doing good enough job for you or do you have to run “vacuum full” regularly?

~~~
pageald
For our use case, we found the best approach was to clone all of the data to a
temporary table with indices and constraints disabled, perform the updates,
re-enable indices and constraints, and then replace the production table with
the temporary table. This only works if you are able to update your data in
bulk, and if some lag time in your updates is OK. This also has the benefit of
never locking your production table.

In situations where real-time updates are important, the key is to minimize
your indices as much as possible. Read up on heap only tuples (HOT). If that
all isn't enough, maybe consider sharding your database.

Never run VACUUM FULL; it locks too aggressively. Let autovacuum do the job.

------
hinkley
| A note about count(1) vs count(* ). One might think that count(1) would be
faster because count(* ) appears to consult the data for a whole row. However
the opposite is true.

Which is the opposite decision of other databases. So sometimes Postgres
_does_ make bad decisions...

~~~
derefr
I think the "the opposite is true" is in reference to the assertion that
"count(⧆) appears to consult the data for a whole row", not to the proposition
that "one might think that count(1) would be faster [than count(⧆)]".

count(⧆) just counts the tuples themselves, which is fast; it's like counting
heap-allocated data structures by counting their pointers (which you're
already walking), without dereferencing those pointers.

count(1) counts the result of evaluating the SQL expression "1" upon landing
on each row, but still walks the same pointers to do so.

So, in terms of time complexity, they're roughly equivalent. Both data items
(the tuple and the SQL constant expression) are already on the stack, ready to
be directly computed upon.

Postgres's count(1) isn't slower than the one in any other DBMS. It's just
their count(⧆)—at least the expression-evaluation part of it—which is _more_
optimized than the one in other DBMSes. Nothing wrong with that, IMHO.

~~~
rejschaap
> Postgres's count(1) isn't slower than the one in any other DBMS.

count( * ) is faster on Postgres than count(1). But both are fundementally
slow because of MVCC. And count( * ) on postgres (the optimized one on
Postgres) is much slower than count(1) on other databases (the optimzed one on
other databases). So practically speaking, counting rows is slower on Postgres
than on other databases.

That said, I love Postgres. I use it every day. There is some room for
improvement and it does improve all the time. It is an amazing open source
project. And I wouldn't care at all if they never optimize count(1)

~~~
derefr
Yes, I was trying to make a finer point—the _difference_ between count(⧆) and
count(1) comes down to the cost of _filtering_ the row, and in Postgres
count(1) is a regular filtering operation—taking the same _filtering cost_
that count(1) has in other DBMSes—while count(⧆) has a _filtering cost_ that
is lower than that of other DBMSes, because it has been specifically
optimized†.

Separately, there's an MVCC cost of _walking_ the rows to filter them, and
other DBMSes optimize walking rows for counting [usually causing both count(1)
and count(⧆) to be faster], while Postgres does not do this optimization.
(And, as stated in the article, in those DBMSes, this isn't a pure
optimization _per se_ , but is rather a trade-off, trading write speed for all
INSERTs/DELETEs for read speed for this particular case.)

(† Technically, the filtering cost of count(⧆) hasn't been _specifically_
optimized; the relative speed of filtering tuples for count(⧆) is an emergent
property of the general fact that Postgres treats any mention of `⧆` as a
reference to the row-tuple object itself. i.e. If `foo` is a table (x int, y
text), then in actuality, `foo` is first created as a type [a pg_class]
defined as the tuple (int, text); and then the table `foo` is defined as a
relation persisting a rowset of `foo`-tuple-typed rows [in est making a
table['s triggerable operations] each into a stored procedure with a
`foo`-tuple-typed-rowset return type.] Then, the expressions `(SELECT ⧆ FROM
foo)`, and `(SELECT f.⧆ FROM foo f)` both evaluate to rowsets type `foo`,
which means that Postgres doesn't need to dereference the pointer to each
`foo` heap tuple to build those rowsets. It only needs to dereference the
pointers when it comes time to actually serialize and emit the row over the
wire—which in case of a `count(⧆)` operation, never happens.)

------
freekh
It is not count( _) that is slow, it is iterating through the rows that is /
can be slow :)

For me, the trick to basic understanding of perf in PG was exactly this: it is
all about limiting the amount of rows you have to iterate over. It is true for
count(_) but also for every other operation you do.

PG is surprisingly non-magical (at least in my experience) in that you won't
get much perf for free, but on the other hand you can reason about perf &
optimize pretty reliably once you come to terms with this.

------
zeroimpl
Note the trigger approach mentioned in the article would be terrible for
performance in a concurrent environment, since only one transaction could
modify the whole table at a time.

~~~
simonw
I wonder if there's a way to "shard" these counters to avoid this problem.

If you had 10 different counters (maybe in ten different tables) and a
mechanism for round-robin or randomly selecting which counter gets
incremented/decremented would that allow ten concurrent transactions at once?

The query to return the total count would then need to sum the 10 individual
counters, which should be extremely fast.

Or is the concurrency limitation here caused by the trigger on the counted
table itself, not the writes performed by the trigger?

~~~
ht85
Even though only 1 / 10 counters would be locked, you'd still have to read all
10 to get the count, which would be blocked until the concurrent transaction
ended.

~~~
ants_a
With MVCC writers don't block readers.

------
brightball
I appreciate approaches that acknowledge when “close” is good enough. It would
be interesting to graph how far off those numbers are to put a real metric to
it.

~~~
nieve
Only one data point, but I've got a custom search engine for a community that
has pretty close to the ideal usage pattern for this: append-only with no
update or deletes. A pure count(*) over 6.8M rows on a slow VPS takes
135,015ms and the n_live_tup technique¹ takes 214ms. The latter was 00.009557%
high before an analyze and 00.001414% low afterward. Definitely good enough
for me.

¹ SELECT n_live_tup FROM pg_stat_all_tables WHERE relname = 'comments';

------
javitury
The author claims that the tally method using "mytable_count" is concurrency
safe.

On the one hand, I think that is great. On the other hand, I am still
suspicious.

------
type0
This title should be "Fast explanation on why PostgreSQL count(*) is slow" or
something like that

------
j16sdiz
Nothing to see here, move along.

Been using this since 2009
[https://wiki.postgresql.org/wiki/Count_estimate](https://wiki.postgresql.org/wiki/Count_estimate)

~~~
davidgould
Be careful with this if your postgresql release is not up to date. There was a
bug in the way ANALYZE updated pg_class.reltuples that could cause the value
in reltuples to grow incorrectly for large tables. If the table had a lot of
updates the pg_class.reltuples value would tend to increase a bit each time it
got updated.

This was fixed in March 2018 and was backpatched so any binary release since
then should be ok, eg 9.x.latest 10.x.latest, 11.x.

See [0] for details of the bug.

[0] [https://www.postgresql.org/message-
id/flat/20180117164916.3f...](https://www.postgresql.org/message-
id/flat/20180117164916.3fdcf2e9%40engels)

