When I read this post, I find the HammerDB TPROC-C benchmark results in Figure 1 to be super interesting, too. When you compare Citus 9.4 performance with distributed tables and distributed functions vs. Citus 9.4 with regular Postgres tables and regular Postgres functions (on a single node) -- then the multiplier is an even more impressive ~19X. :)
In Citus 8, the coordinator node would run stored procedures. Citus 9.x introduced changes to push down stored procedures to worker nodes. So, if your stored procedures mostly touches local shards, Citus 9 significantly cuts down network traffic.
What's kind of unique about the distributed functions feature in Citus that's described in the blog post is that you can scale stored procedures horizontally if procedure calls query co-located shards, but it doesn't restrict you from doing queries and transactions across all shards.
Happy to answer any questions or debate the merits/evils of stored procedures.