
PostgreSQL 11: something for everyone - petergeoghegan
https://lwn.net/Articles/764515/
======
Dowwie
Note the following passage:

"It's clear to me that a certain proportion of application developers are
allergic to putting business logic in the database system, despite the fact
that batch processing logic is often a great deal faster when implemented that
way. Perhaps the fact that clients are now able to cede full control of
transaction scope to procedural code running on the database server (code that
can be written in a familiar scripting language) will help application
developers to get over the understandable aversion. Hope springs eternal."

~~~
danpalmer
I'm one of these application developers, and I'm still seeing a lot in my way
to moving logic to the database.

From coding in Python with Django, I'd need to give up version control as I
know it, use a different Python environment, give up unit testing as I know it
(and as it integrates with the rest of my application), authentication
(because that's owned by Django), distributed tasks and queueing, performance
monitoring (as I know it), etc.

I would really like to see a Proof of Concept of a modern web framework like
Django/Rails, but with a strong focus on moving as much to the database as
possible - user management, queues, etc, as I think there are many benefits to
be had from pushing more into the database.

Until something like that exists though, I think moving logic to the database
will lose development tooling and process that we are used to, and end up re-
implementing logic in multiple places which isn't great.

~~~
zie
We don't use django, but we basically do this, we put python in the DB
(PL/pythonu) and it's hosted in a VCS repo and use liquibase[0] to handle
getting it into the DB, along with our schema's, etc.

Debugging can be a little harder, but testing generally isn't you know the
calling signature(s) PG uses so writing a class for UnitTest/pytest is
straightforward.

Our APP logins _ARE_ DB logins for our desktop product(s), but for web we have
a single DB user that web apps log into the DB as. We'd love for web logins to
become PG logins as well, but we haven't found a very performant way of doing
that yet. PG sessions are a heavy startup cost.

The functionality for us, isn't the same between the web and desktop
product(s) which helps a lot.

We push access control into the DB for the most part, using row, column and
table level access controls PG provides us. So our desktop product has
basically no access controls in it, since the DB provides all of that at login
time. We have a few power users that connect directly with SQL, and we have
zero issues just handing them access, since the DB controls what they can/can
not see just fine.

Sadly so many tools just assume their DB connection is the equivalent to root,
and can just go do whatever it wants, which makes lots of tools, especially
reporting tools, hard to just hand directly to end users.

[0] [http://www.liquibase.org/](http://www.liquibase.org/)

~~~
camel_Snake
I think Postgrest[0] takes an approach to authorization you may want to look
into. My understanding is they use one base user that is granted the ability
to switch roles to reflect the current user once they're authed by your app
server.

Have you transitioned your business logic into your DB from your app or is
that approach something you started with?

[0]
[http://postgrest.org/en/stable/auth.html](http://postgrest.org/en/stable/auth.html)

~~~
zie
We started with it. It's always best to offload your security/privacy posture
onto something well known and supported rather than rolling your own, if
possible. It's hard stuff to get right, and PG does it really well, especially
in more recent versions. Row/Column security in the 8.X series was a bit
painful :)

We looked into doing something similar, but it didn't scale to our workload
and had a ton of edge cases we kept getting caught up in. If a breach on the
PG connection happens with that approach, the world gets super bad, super
fast, as if I remember right, you need to be a super user to switch roles like
that. That's very, very worrisome exposing super-user credentials like that to
a website. If I remember right, we had to throw the connection away after
using it for some reason, so the startup-cost gets cut some, but if you have a
mass influx of users and your pool size is not gigantic, it gets painful
waiting for new PG connections to come up. Plus the cost of holding open so
many PG connections if your pool size is too large.... And our product has a
module where all users attack it pretty much all at once, twice a day... this
was back in the 8.X series of PG, so maybe it's a lot better now...

If postgrest manages to make it scale to our workload and took care of all the
edge cases well, it would be super awesome. I'll have to check into how they
handle that.. maybe it's something we can look into adding in the future,
thanks!

~~~
steve-chavez
> you need to be a super user to switch roles like that

No need for superuser, only membership to the role that it's going to switch
is required. i.e. a `GRANT web_user TO connector`. Besides that, only
privilege the connector role requires is LOGIN.

------
clarkevans
Now, if I could only have a pljulia... so that I could run statistical
analysis (packaged as a Julia function) from a query. With PostgreSQL the
request queue is never ending ;)

~~~
grzm
I don't know the underlying engine for Julia, but I suspect where there's a
will, there's a way. Have you considered either writing it yourself or
sponsoring someone to work on it?

------
faitswulff
Curious to see if the emphasis on parallelism will affect Citus Data's
business model?

~~~
craigkerstiens
At Citus we're actually very excited about the parallelism improvements within
Postgres. For us at Citus the core of what we do is allow you scale
horizontally beyond a single node. We view the parallelism work within
Postgres as complementary, as we can focus on the distributed aspects while
letting each node handle it's own workload.

Some of our customers do today very much leverage the parallelism Citus
brings, but again that's across multiple nodes. A number of our customers
don't choose Citus at all for real-time analytics, instead they're more
focused on a more standard OLTP application, often multi-tenant app, that they
need to scale.

------
chinhodado
A bit off topic but I wish postgres came with a good desktop app. pgadmin 4 is
just abysmal compared to e.g. MySQL Workbench.

~~~
okcwarrior
I know it's somewhat pricey but datagrip by jetbrains is amazing.

~~~
kyriakos
And if you are not only working with DBs you can get it bundled with their
IDEs. Eg phpstorm has most of datagrip functionality built in.

------
tomrod
I'm fairly newb to PostgreSQL, and am not an expert (just a user) in many SQL
systems. But if Postgres' typecasting methodology is unique to it, by jove
it's a great point.

What other features should I learn about (comparisons to MySQL/Oracle)? Any
reason to focus on Postgres/Greenplum over a Hadoop hive/impala setup?

~~~
sitharus
Other DBMSs have similar typecasting to PostgreSQL, but they're all different.
It's one of those things that never really standardised. Personally I like the
breadth of different types available in PostgreSQL. First-class enums and
ranges really help a lot of apps, and while table inheritance has issues it
does have some really interesting uses.

As for features to look in too, I'd look at the in-server functions (plpgsql,
plv8, plperl etc) for extensibility. Custom and complex types are pretty
awesome as well. Also the range of functionality in window functions can make
some really interesting queries possible. Also the concurrency model does need
some understanding, as it means a commit will fail rather than deadlock.

Why should use use PostgreSQL over Hadoop? If you have less than ~20TB of data
then it's much less complex to set up and maintain, and as long as you don't
want to read the entire data set at once it's fast. If you have more but it's
easily partitioned (eg multi-tenant) then Citus have a package that can help,
or if you have column-oriented data Citus have a different extension to help.

Postgres is also really well tested and has decades of production use behind
it, so backups and restores are well known, though that does require manual
scripting to copy the files.

~~~
akira2501
> Personally I like the breadth of different types available in PostgreSQL.

Also for how extensible they are. I'm a complete amateur at GIS, but PostGIS
is one of the neatest large-scale uses of the system I've seen.

------
atombender
Anyone know what the filter evaluation redesign they did in version 10 was?

> The v10 improvement essentially flattened the representation of expression
> state and changed how expression evaluation was performed.

It sounds like they were using an expression tree and changed it to something
like truth tables, which are essentially "flat" expression representations
that can support any boolean expression.

~~~
anarazel
> It sounds like they were using an expression tree and changed it to
> something like truth tables, which are essentially "flat" expression
> representations that can support any boolean expression.

I don't think that's an accurate description - it's more that we converted the
tree representation (which still exists at parse/plan time) into an
opcode/bytecode representation, evaluated by a mini-VM. Obviously that VM is
quite specialized, and not general purpose (and currently it's afaik no
internally turing complete, due to the lack of backward jumps, but that's just
a question of the program generator, rather than the VM itself). The VM uses
computed goto based dispatch if available (even though on modern x86 hardware
the benefits aren't as large as on older HW).

~~~
atombender
Makes sense, thanks. I thought maybe this was a smaller change where you just
used a different layout/representation of the tree that allowed faster, more
cache-friendly evaluation, but you've actually gone much further with this VM,
which is fascinating given that it opens up the possibility of JITing queries.

My day job involves working on a query engine that faces some of the same
design challenges, so I find this stuff very interesting.

~~~
anarazel
> but you've actually gone much further with this VM, which is fascinating
> given that it opens up the possibility of JITing queries.

That's why I did it, and why that's the first part in PG 11 that got JITed ;)

> My day job involves working on a query engine that faces some of the same
> design challenges, so I find this stuff very interesting.

FWIW, part of the design decisions I made for this stuff in PG were made to a
significant part because it had to be adaptable incrementally without breaking
too many idiosyncrasies (although there were some, in another commit, around
multiple set-returning-functions in the SELECT list). I'd definitely do some
things differently if I were to design more on a green field.

------
juststeve
i am enjoying it much more than sql server

------
noncoml
How is it in regards to Vacuuming and DB size?

------
wenc
I'm surprised covering indices is only now appearing in 11. It's fairly
commonplace in other databases.

~~~
geophile
I believe HOT provided some of the advantages of covering indexes.

Covering indexes is tricky in Postgres, due to its MVCC implementation.
Versioning info is in the table, not the index, so the index may have stale
data. The table itself has to be consulted to find out if the data in the
index is actually visible to your transaction.

~~~
petergeoghegan
You're talking about index-only scans. Those appeared in Postgres 9.2, which
came out about 5 years ago.

INCLUDE/covering indexes are indexes that can include extra "payload" columns,
and exist only for the benefit of index-only scans. You can add something
extra to the primary key for the largest table in your database, all without
changing how and when duplicate violation errors are raised.

------
the_duke
Would be nice to see some JIT benchmarks with sample queries and the JIT
speedup.

~~~
anarazel
It's a bit hard what kind of benchmark to choose - you can easily get into
3-5x territory with arbitrarily expression heavy queries (say evaluating math
equations in WHERE) - but whether that's realistic is another question.

    
    
      CREATE TABLE large(id serial not null, tz timestamptz DEFAULT clock_timestamp() NOT NULL, data_double float8 DEFAULT random() NOT NULL, data_text text DEFAULT (random()::text) NOT NULL, data_other text NOT NULL);
    
      INSERT INTO large(data_other) SELECT generate_series(1, 100000000) i;
    
      SELECT pg_prewarm('large');
    

yields (best of three)

    
    
        postgres[23419][1]=# set jit = 1;
      
      postgres[23419][1]=# EXPLAIN ANALYZE SELECT min(tz), sum(data_double), max(data_double) FROM large WHERE tz > '2018-09-23 14:47:30.298302-07';
      ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
      │                                                                    QUERY PLAN                                                                     │
      ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
      │ Finalize Aggregate  (cost=1495625.42..1495625.43 rows=1 width=24) (actual time=3871.540..3871.540 rows=1 loops=1)                                 │
      │   ->  Gather  (cost=1495625.20..1495625.41 rows=2 width=24) (actual time=3871.388..3873.705 rows=3 loops=1)                                       │
      │         Workers Planned: 2                                                                                                                        │
      │         Workers Launched: 2                                                                                                                       │
      │         ->  Partial Aggregate  (cost=1494625.20..1494625.21 rows=1 width=24) (actual time=3856.050..3856.050 rows=1 loops=3)                      │
      │               ->  Parallel Seq Scan on large  (cost=0.00..1417317.00 rows=10307760 width=16) (actual time=98.533..2216.673 rows=33333333 loops=3) │
      │                     Filter: (tz > '2018-09-23 14:47:30.298302-07'::timestamp with time zone)                                                      │
      │ Planning Time: 0.041 ms                                                                                                                           │
      │ JIT:                                                                                                                                              │
      │   Functions: 8                                                                                                                                    │
      │   Generation Time: 0.934 ms                                                                                                                       │
      │   Inlining: true                                                                                                                                  │
      │   Inlining Time: 5.145 ms                                                                                                                         │
      │   Optimization: true                                                                                                                              │
      │   Optimization Time: 55.719 ms                                                                                                                    │
      │   Emission Time: 43.121 ms                                                                                                                        │
      │ Execution Time: 3874.702 ms                                                                                                                       │
      └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
      (17 rows)
      
      
      postgres[23419][1]=# set jit = 0;
      
      
      postgres[23419][1]=# EXPLAIN ANALYZE SELECT min(tz), sum(data_double), max(data_double) FROM large WHERE tz > '2018-09-23 14:47:30.298302-07';
      ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
      │                                                                    QUERY PLAN                                                                    │
      ├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
      │ Finalize Aggregate  (cost=1495625.42..1495625.43 rows=1 width=24) (actual time=5214.622..5214.622 rows=1 loops=1)                                │
      │   ->  Gather  (cost=1495625.20..1495625.41 rows=2 width=24) (actual time=5214.468..5216.057 rows=3 loops=1)                                      │
      │         Workers Planned: 2                                                                                                                       │
      │         Workers Launched: 2                                                                                                                      │
      │         ->  Partial Aggregate  (cost=1494625.20..1494625.21 rows=1 width=24) (actual time=5179.552..5179.552 rows=1 loops=3)                     │
      │               ->  Parallel Seq Scan on large  (cost=0.00..1417317.00 rows=10307760 width=16) (actual time=0.012..2671.871 rows=33333333 loops=3) │
      │                     Filter: (tz > '2018-09-23 14:47:30.298302-07'::timestamp with time zone)                                                     │
      │ Planning Time: 0.111 ms                                                                                                                          │
      │ Execution Time: 5216.123 ms                                                                                                                      │
      └──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
      (9 rows)
    

Now obviously those queries are somewhat nonsensical, but IMO they're not an
unrealistic representation of what kind of operations happen in analytics
queries. If you disable parallelism / add more aggregates / more filters, the
benefit get bigger. If you add more joins etc, the benefits get smaller
(because the overhead is in parts that don't benefit from JITing).

~~~
anarazel
Oops, those included per-row timing overhead. Here's the same without that:

    
    
      postgres[24261][1]=# set jit = 1;
      
      postgres[24261][1]=# EXPLAIN (ANALYZE, TIMING OFF) SELECT min(tz), sum(data_double), max(data_double) FROM large WHERE tz > '2018-09-23 14:47:30.298302-07';
      ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
      │                                                         QUERY PLAN                                                          │
      ├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
      │ Finalize Aggregate  (cost=1495625.42..1495625.43 rows=1 width=24) (actual rows=1 loops=1)                                   │
      │   ->  Gather  (cost=1495625.20..1495625.41 rows=2 width=24) (actual rows=3 loops=1)                                         │
      │         Workers Planned: 2                                                                                                  │
      │         Workers Launched: 2                                                                                                 │
      │         ->  Partial Aggregate  (cost=1494625.20..1494625.21 rows=1 width=24) (actual rows=1 loops=3)                        │
      │               ->  Parallel Seq Scan on large  (cost=0.00..1417317.00 rows=10307760 width=16) (actual rows=33333333 loops=3) │
      │                     Filter: (tz > '2018-09-23 14:47:30.298302-07'::timestamp with time zone)                                │
      │ Planning Time: 0.121 ms                                                                                                     │
      │ JIT:                                                                                                                        │
      │   Functions: 8                                                                                                              │
      │   Inlining: true                                                                                                            │
      │   Optimization: true                                                                                                        │
      │ Execution Time: 2440.328 ms                                                                                                 │
      └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
      (13 rows)
      
      Time: 2440.891 ms (00:02.441)
      
      postgres[24261][1]=# set jit = 0;
      
      postgres[24261][1]=# EXPLAIN (ANALYZE, TIMING OFF) SELECT min(tz), sum(data_double), max(data_double) FROM large WHERE tz > '2018-09-23 14:47:30.298302-07';
      ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
      │                                                         QUERY PLAN                                                          │
      ├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
      │ Finalize Aggregate  (cost=1495625.42..1495625.43 rows=1 width=24) (actual rows=1 loops=1)                                   │
      │   ->  Gather  (cost=1495625.20..1495625.41 rows=2 width=24) (actual rows=3 loops=1)                                         │
      │         Workers Planned: 2                                                                                                  │
      │         Workers Launched: 2                                                                                                 │
      │         ->  Partial Aggregate  (cost=1494625.20..1494625.21 rows=1 width=24) (actual rows=1 loops=3)                        │
      │               ->  Parallel Seq Scan on large  (cost=0.00..1417317.00 rows=10307760 width=16) (actual rows=33333333 loops=3) │
      │                     Filter: (tz > '2018-09-23 14:47:30.298302-07'::timestamp with time zone)                                │
      │ Planning Time: 0.041 ms                                                                                                     │
      │ Execution Time: 4140.116 ms                                                                                                 │
      └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
      (9 rows)
      
      Time: 4140.499 ms (00:04.140)

~~~
anarazel
FWIW, the non-parallel version yields a profile like:

    
    
      -   25.69%  postgres  jitted-24373-18.so             [.] evalexpr_3_3
         - 96.00% evalexpr_3_3
              ExecAgg
         + 2.89% __isnan
      -   12.15%  postgres  jitted-24373-16.so             [.] evalexpr_3_0
         - 99.38% evalexpr_3_0
              ExecSeqScan
              ExecAgg
         + 0.56% ExecSeqScan
      -   11.74%  postgres  postgres                       [.] heap_getnext
           99.83% heap_getnext
              ExecSeqScan
              ExecAgg
      +    9.31%  postgres  postgres                       [.] ExecSeqScan
      +    7.66%  postgres  libc-2.27.so                   [.] __memmove_avx_unaligned_erms
      +    5.61%  postgres  postgres                       [.] heapgetpage
      +    5.36%  postgres  postgres                       [.] ExecAgg
      +    5.35%  postgres  postgres                       [.] MemoryContextReset
      +    4.11%  postgres  postgres                       [.] HeapTupleSatisfiesMVCC
      +    3.53%  postgres  postgres                       [.] ExecStoreTuple
      +    2.55%  postgres  postgres                       [.] hash_search_with_hash_value
      +    1.63%  postgres  libc-2.27.so                   [.] __isnan
      +    1.16%  postgres  postgres                       [.] CheckForSerializableConflictOut
    
    

so it's pretty clear that there's plenty additional work can be done to
improve the generated code (it's pretty crappy right now), and that there's
potential around improving things due to batching (mainly for increased cache
efficiency).

------
lngnmn1
Love Pirsig's analogies and can't resist to point out that a "classic
approach" will always beat an ignorant hype-driven bullshit in the long run.

Informix (now dead?) vs. Oracle is another classic example from the past.

Oracles's domination absolutely does not imply its technical superiority,
actually popularity is usually a bad thing (hello, Ayn Rand) - Java and junk
food are insanely popular.

~~~
lngnmn1
It is useless to cowardly downvote an unpopular opinion without giving any
reasons or because you don't like the style of a comment. Cosplay of
political-correctness huts the truth and sometimes a form (wording) matters a
lot.

Well-researched thing will always beat fast-coded-to-market crap. Erlang
(despite all its syntactic ugliness), Go (especially stdlib and runtime, which
is hard), Scala, Redis, nginx, Haskell (monad-madness aside), Clojure etc,
etc.

Ability to produce a lot of crappy code quickly is not a substitute for a
decent academic research with proper attention to details. This is the point.
Postgres was a research vehicle in the past, now polished to a state-of-the-
art product, just like SQLite.

How is MongoDB, by the way? Still a default choice or it has been recently
obsoleted by a blockchain?

