PostgreSQL 11: something for everyone

Dowwie · on Sept 23, 2018

Note the following passage:

"It's clear to me that a certain proportion of application developers are allergic to putting business logic in the database system, despite the fact that batch processing logic is often a great deal faster when implemented that way. Perhaps the fact that clients are now able to cede full control of transaction scope to procedural code running on the database server (code that can be written in a familiar scripting language) will help application developers to get over the understandable aversion. Hope springs eternal."

nijave · on Sept 23, 2018

That bothered me. I've worked on applications that store business logic in the database and they are a lot harder to maintain and verify. You could also say the same thing about assembly but there's a reason nobody's jumping on that

danpalmer · on Sept 23, 2018

I'm one of these application developers, and I'm still seeing a lot in my way to moving logic to the database.

From coding in Python with Django, I'd need to give up version control as I know it, use a different Python environment, give up unit testing as I know it (and as it integrates with the rest of my application), authentication (because that's owned by Django), distributed tasks and queueing, performance monitoring (as I know it), etc.

I would really like to see a Proof of Concept of a modern web framework like Django/Rails, but with a strong focus on moving as much to the database as possible - user management, queues, etc, as I think there are many benefits to be had from pushing more into the database.

Until something like that exists though, I think moving logic to the database will lose development tooling and process that we are used to, and end up re-implementing logic in multiple places which isn't great.

manigandham · on Sept 23, 2018

That statement is about batch processing, like ETL work, that can and should be close to the data. SQL is still just code and can be saved in git along with a CI/CD deployment process as well.

OLTP transactions for running a basic webapp don't need to move there, and the data still needs to be retrieve anyway, but things like a queue, the actual push and pop functions can easily be wrapped in SQL functions that make it easy and safe to call from multiple apps.

zie · on Sept 23, 2018

We don't use django, but we basically do this, we put python in the DB (PL/pythonu) and it's hosted in a VCS repo and use liquibase[0] to handle getting it into the DB, along with our schema's, etc.

Debugging can be a little harder, but testing generally isn't you know the calling signature(s) PG uses so writing a class for UnitTest/pytest is straightforward.

Our APP logins ARE DB logins for our desktop product(s), but for web we have a single DB user that web apps log into the DB as. We'd love for web logins to become PG logins as well, but we haven't found a very performant way of doing that yet. PG sessions are a heavy startup cost.

The functionality for us, isn't the same between the web and desktop product(s) which helps a lot.

We push access control into the DB for the most part, using row, column and table level access controls PG provides us. So our desktop product has basically no access controls in it, since the DB provides all of that at login time. We have a few power users that connect directly with SQL, and we have zero issues just handing them access, since the DB controls what they can/can not see just fine.

Sadly so many tools just assume their DB connection is the equivalent to root, and can just go do whatever it wants, which makes lots of tools, especially reporting tools, hard to just hand directly to end users.

[0] http://www.liquibase.org/

camel_Snake · on Sept 23, 2018

I think Postgrest[0] takes an approach to authorization you may want to look into. My understanding is they use one base user that is granted the ability to switch roles to reflect the current user once they're authed by your app server.

Have you transitioned your business logic into your DB from your app or is that approach something you started with?

[0] http://postgrest.org/en/stable/auth.html

zie · on Sept 23, 2018

We started with it. It's always best to offload your security/privacy posture onto something well known and supported rather than rolling your own, if possible. It's hard stuff to get right, and PG does it really well, especially in more recent versions. Row/Column security in the 8.X series was a bit painful :)

We looked into doing something similar, but it didn't scale to our workload and had a ton of edge cases we kept getting caught up in. If a breach on the PG connection happens with that approach, the world gets super bad, super fast, as if I remember right, you need to be a super user to switch roles like that. That's very, very worrisome exposing super-user credentials like that to a website. If I remember right, we had to throw the connection away after using it for some reason, so the startup-cost gets cut some, but if you have a mass influx of users and your pool size is not gigantic, it gets painful waiting for new PG connections to come up. Plus the cost of holding open so many PG connections if your pool size is too large.... And our product has a module where all users attack it pretty much all at once, twice a day... this was back in the 8.X series of PG, so maybe it's a lot better now...

If postgrest manages to make it scale to our workload and took care of all the edge cases well, it would be super awesome. I'll have to check into how they handle that.. maybe it's something we can look into adding in the future, thanks!

steve-chavez · on Sept 25, 2018

> you need to be a super user to switch roles like that

No need for superuser, only membership to the role that it's going to switch is required. i.e. a `GRANT web_user TO connector`. Besides that, only privilege the connector role requires is LOGIN.

paulddraper · on Sept 23, 2018

I've noticed the same and will compare it graphics programming.

Loads of GUI applications write their UIs in application code. Eventually they have to reach to an actual graphics buffer, or OpenGL or similar lib, but only in the most simplistic, elementary way possible. (Sort of like `SELECT * FROM user WHERE id = 4`.)

That keeps 98% your logic in one place, with one language, one composition paradigm, and one set of tools.

For programs that have demanding graphics needs, however, they will start writing greater and greater amounts of code into shaders, to be run on the GPU. It's more complex to split logic like this, and the capabilities of shader languages aware far beneath those of general purpose programming languages. But when it matters, you'll happily take the improved performance.

It's great to see PostgreSQL doing what it can to close the gap and make this easier.

tl;dr

SQL queries are the graphics shaders of server programs.

clarkevans · on Sept 23, 2018

Now, if I could only have a pljulia... so that I could run statistical analysis (packaged as a Julia function) from a query. With PostgreSQL the request queue is never ending ;)

grzm · on Sept 23, 2018

I don't know the underlying engine for Julia, but I suspect where there's a will, there's a way. Have you considered either writing it yourself or sponsoring someone to work on it?

pella · on Sept 23, 2018

Embedding Julia is not hard. ( "calling Julia function from C code" )

https://docs.julialang.org/en/v1/manual/embedding/

faitswulff · on Sept 23, 2018

Curious to see if the emphasis on parallelism will affect Citus Data's business model?

craigkerstiens · on Sept 23, 2018

At Citus we're actually very excited about the parallelism improvements within Postgres. For us at Citus the core of what we do is allow you scale horizontally beyond a single node. We view the parallelism work within Postgres as complementary, as we can focus on the distributed aspects while letting each node handle it's own workload.

Some of our customers do today very much leverage the parallelism Citus brings, but again that's across multiple nodes. A number of our customers don't choose Citus at all for real-time analytics, instead they're more focused on a more standard OLTP application, often multi-tenant app, that they need to scale.

chinhodado · on Sept 23, 2018

A bit off topic but I wish postgres came with a good desktop app. pgadmin 4 is just abysmal compared to e.g. MySQL Workbench.

steamer25 · on Sept 23, 2018

dbeaver-community does the day to day stuff really well. One great feature is built-in SSH tunneling. There are also frequent updates.

The only thing I haven't been able to find a great solution for yet is comparing schemas between DB instances and generating DDL to make up the difference.

oneweekwonder · on Sept 23, 2018

> solution for yet is comparing schemas between DB instances and generating DDL to make up the difference.

Have you looked at liquibase diff[0] before?

I created yet another python wrapper[1] for the jar to integrate it into my workflow for easy DDL generation as I need to compare databases.

[0]: https://www.liquibase.org/documentation/diff.html

[1]: https://github.com/Morabaraba/python-liquibase

steamer25 · on Sept 24, 2018

I don't remember why I dismissed liquibase when I was looking at the options. Maybe it was seeing that 'Role-Based Authentication' is only supported in the paid version? I ended up getting most of what I needed out of pgquarrel although I think I ended up manually maintaining the auth parts anyway. If/when it comes up again, maybe I'll take another look at liquibase.

Ultimately, yeah, I just got better at keeping track of the changes myself as in the article you link in your GitHub README: http://www.liquibase.org/2007/06/the-problem-with-database-d...

okcwarrior · on Sept 23, 2018

I know it's somewhat pricey but datagrip by jetbrains is amazing.

kyriakos · on Sept 23, 2018

And if you are not only working with DBs you can get it bundled with their IDEs. Eg phpstorm has most of datagrip functionality built in.

h1d · on Sept 23, 2018

Their offerings aren't pricey for what they provide. DataGrip alone is about $9/mo for a personal license (down to $4.5/mo on 3rd year if you pay annually) but you'd want to use the bundled one in other IDE if you code.

ziftface · on Sept 23, 2018

TablePlus is fantastic. I think it's coming for Windows soon.

throwaway83628 · on Sept 23, 2018

Try DBeaver its OSS, multi-database, and just about as good as MySql Workbench or SSMS.

Since I switched I don't use anything else

bradleyjg · on Sept 23, 2018

Just out of curiosity, why did this require a throwaway?

throwaway83628 · on Sept 23, 2018

I hardly ever post comments and never articles, so I just create an account when it seems worthwhile. The mods don't seem bothered by this as long as such comments aren't flamey

grzm · on Sept 23, 2018

From the guidelines:

> "Throwaway accounts are ok for sensitive information, but please don't create them routinely. On HN, users should have an identity that others can relate to."

https://news.ycombinator.com/newsguidelines.html

As recently as a week ago, 'dang had this to say about throwaways:

> "HN can't be a community without members. Disembodied comments are not community members. No one is required to use their real name here, but users need to have some consistent identity for others to relate to. Otherwise we might as well have no usernames and no community at all."

https://news.ycombinator.com/item?id=17996295

And later goes on to make clear this includes "users who create throwaway accounts for everything, including uncontroversial stuff".

https://news.ycombinator.com/item?id=17996858

They also don't see or act on everything they see, so in that sense, I think you could say they "don't seem bothered." That said, I think it's fair to say it's preferred that you maintain some sort of identity.

throwaway83628 · on Sept 26, 2018

I definitely dont create them routinely. Maybe 4 times a year. Consistency isn't worth much if nobody knows who you are because you never post.

The desire to create a community chafes against the zero verfication HN does for new accounts. If you want to create a community it should be harder to create a new account than login, but it's actually the opposite. To create a new account I don't need to remember my password, I just need a username and keyspam.

This probably isn't what Dang wants but it's how the site currently works. If it's easier to create an account a couple times a year than remember my login, especially when nobody is gonna remember someone who posts 4 times a year, what is the point of maintaining an account. Just being logical here.

manigandham · on Sept 23, 2018

DBeaver (free community version) or Jetbrains Rider. Both are cross-platform desktop apps with multiple database support and are very capable and reliable.

If you want something that's a webapp-style, perhaps to host as a separate service, then check out OmniDB from 2ndQuadrant. It's still maturing but has some unique management features that are helpful for Postgres.

sv12l · on Sept 23, 2018

https://dbeaver.io/ is something you may want to try, really good based on my experience.

titusjohnson · on Sept 23, 2018

I use Postico on Mac.

commandertso · on Sept 23, 2018

+1 for Postico. One-time fee, and I've only ever had one or two moments of frustration with it - as opposed to much larger numbers with PGAdmin 3 and 4.

joshcrews · on Sept 23, 2018

+ 5,000 for Postico. Such excellent software.

nixpulvis · on Sept 23, 2018

I like `psql` quite a lot. If you have your PAGER set to `less`. Also `\x` can really help view many columns vertically.

grzm · on Sept 23, 2018

I like psql a lot as well. That said, there's often a place for GUIs, and your parent asked specifically about desktop apps.

nixpulvis · on Sept 23, 2018

Sure did, just thought they might also find this information helpful. Makes the `psql` client a bit more GUI like.

mmsimanga · on Sept 23, 2018

I see quite a few mentioning DBeaver. I want to mention two other SQL frontends worth looking at. SQL Workbench[1] and SQuirrel SQL[2]. SQL Workbench is particularly well documented with a tonne of features with some features integrated to the command line.

[1]https://www.sql-workbench.eu/ [2]http://www.squirrelsql.org/

krakatau1 · on Sept 23, 2018

I can't believe nobody mentioned DbVisualizer. It's by far the best db tool and it works with bunch of them.

Advanced Query Tool is also good but it's geared more for BA than DBA.

Aqua Data Studio is also very good but too expensive ($499 single user licence).

For reference, I've also used pgadmin, DBeaver, DataGrip, SQuirrel in my two year long quest to find the best tool. I was assigned to pick best tool for 500 developers and business analysts.

Trufa · on Sept 23, 2018

Wow, considering I have the worst memories of MySQL Workbench, pgadmin sounds awful indeed.

war1025 · on Sept 23, 2018

pgadmin3 was pretty decent. pgadmin4 is junk.

chinhodado · on Sept 23, 2018

Not sure why you're being downvoted, I have the same sentiment. pgadmin 3 was by no means a great app but it does the job well. pgadmin 4 is just a mess. It's slow, bloated and buggy well beyond the acceptable range.

steve_adams_86 · on Sept 23, 2018

I was totally floored by 4. What a bizarre regression. 3 was relatively good so I had my hopes pretty high... I literally never used either of them again. If that's the future of the app, let me out.

kchoudhu · on Sept 23, 2018

Get someone (anyone) to pay for a copy of Aqua Data Studio.

jenaimarre · on Sept 23, 2018

Not free and not open source, but Navicat has been working very well for me over the years.

Waterluvian · on Sept 23, 2018

Pgadmin4 just stopped launching one day, forcing me back to 3. I don't really mind it.

chinhodado · on Sept 23, 2018

I would use pgadmin 3 too if it doesn't pop up a dozen error messages when connecting to a postgres 10 database. There's a pgadmin 3 LTS version but it only comes bundled with a postgres installer, there's no separate package for it.

sysashi · on Sept 23, 2018

Maybe not as feature rich as others, but check out pgweb

tomrod · on Sept 23, 2018

I'm fairly newb to PostgreSQL, and am not an expert (just a user) in many SQL systems. But if Postgres' typecasting methodology is unique to it, by jove it's a great point.

What other features should I learn about (comparisons to MySQL/Oracle)? Any reason to focus on Postgres/Greenplum over a Hadoop hive/impala setup?

sitharus · on Sept 23, 2018

Other DBMSs have similar typecasting to PostgreSQL, but they're all different. It's one of those things that never really standardised. Personally I like the breadth of different types available in PostgreSQL. First-class enums and ranges really help a lot of apps, and while table inheritance has issues it does have some really interesting uses.

As for features to look in too, I'd look at the in-server functions (plpgsql, plv8, plperl etc) for extensibility. Custom and complex types are pretty awesome as well. Also the range of functionality in window functions can make some really interesting queries possible. Also the concurrency model does need some understanding, as it means a commit will fail rather than deadlock.

Why should use use PostgreSQL over Hadoop? If you have less than ~20TB of data then it's much less complex to set up and maintain, and as long as you don't want to read the entire data set at once it's fast. If you have more but it's easily partitioned (eg multi-tenant) then Citus have a package that can help, or if you have column-oriented data Citus have a different extension to help.

Postgres is also really well tested and has decades of production use behind it, so backups and restores are well known, though that does require manual scripting to copy the files.

akira2501 · on Sept 23, 2018

> Personally I like the breadth of different types available in PostgreSQL.

Also for how extensible they are. I'm a complete amateur at GIS, but PostGIS is one of the neatest large-scale uses of the system I've seen.

atombender · on Sept 23, 2018

Anyone know what the filter evaluation redesign they did in version 10 was?

> The v10 improvement essentially flattened the representation of expression state and changed how expression evaluation was performed.

It sounds like they were using an expression tree and changed it to something like truth tables, which are essentially "flat" expression representations that can support any boolean expression.

anarazel · on Sept 23, 2018

> It sounds like they were using an expression tree and changed it to something like truth tables, which are essentially "flat" expression representations that can support any boolean expression.

I don't think that's an accurate description - it's more that we converted the tree representation (which still exists at parse/plan time) into an opcode/bytecode representation, evaluated by a mini-VM. Obviously that VM is quite specialized, and not general purpose (and currently it's afaik no internally turing complete, due to the lack of backward jumps, but that's just a question of the program generator, rather than the VM itself). The VM uses computed goto based dispatch if available (even though on modern x86 hardware the benefits aren't as large as on older HW).

atombender · on Sept 23, 2018

Makes sense, thanks. I thought maybe this was a smaller change where you just used a different layout/representation of the tree that allowed faster, more cache-friendly evaluation, but you've actually gone much further with this VM, which is fascinating given that it opens up the possibility of JITing queries.

My day job involves working on a query engine that faces some of the same design challenges, so I find this stuff very interesting.

anarazel · on Sept 23, 2018

> but you've actually gone much further with this VM, which is fascinating given that it opens up the possibility of JITing queries.

That's why I did it, and why that's the first part in PG 11 that got JITed ;)

> My day job involves working on a query engine that faces some of the same design challenges, so I find this stuff very interesting.

FWIW, part of the design decisions I made for this stuff in PG were made to a significant part because it had to be adaptable incrementally without breaking too many idiosyncrasies (although there were some, in another commit, around multiple set-returning-functions in the SELECT list). I'd definitely do some things differently if I were to design more on a green field.

atombender · on Sept 23, 2018

Replying to myself: Apparently this [1] is the change.

[1] https://www.postgresql.org/message-id/E1crtuL-0002bw-Pq%40ge...

anarazel · on Sept 23, 2018

Correct.

juststeve · on Sept 23, 2018

i am enjoying it much more than sql server

noncoml · on Sept 23, 2018

How is it in regards to Vacuuming and DB size?

wenc · on Sept 23, 2018

I'm surprised covering indices is only now appearing in 11. It's fairly commonplace in other databases.

geophile · on Sept 23, 2018

I believe HOT provided some of the advantages of covering indexes.

Covering indexes is tricky in Postgres, due to its MVCC implementation. Versioning info is in the table, not the index, so the index may have stale data. The table itself has to be consulted to find out if the data in the index is actually visible to your transaction.

petergeoghegan · on Sept 23, 2018

You're talking about index-only scans. Those appeared in Postgres 9.2, which came out about 5 years ago.

INCLUDE/covering indexes are indexes that can include extra "payload" columns, and exist only for the benefit of index-only scans. You can add something extra to the primary key for the largest table in your database, all without changing how and when duplicate violation errors are raised.

polskibus · on Sept 23, 2018

I wonder if it will lead to finally implementing clustered index in pgsql and breaking he last major obstacle to match performance with MS SQL.

the_duke · on Sept 23, 2018

Would be nice to see some JIT benchmarks with sample queries and the JIT speedup.

manigandham · on Sept 23, 2018

From citus blog: https://www.citusdata.com/blog/2018/09/11/postgresql-11-just...

anarazel · on Sept 23, 2018

It's a bit hard what kind of benchmark to choose - you can easily get into 3-5x territory with arbitrarily expression heavy queries (say evaluating math equations in WHERE) - but whether that's realistic is another question.

  CREATE TABLE large(id serial not null, tz timestamptz DEFAULT clock_timestamp() NOT NULL, data_double float8 DEFAULT random() NOT NULL, data_text text DEFAULT (random()::text) NOT NULL, data_other text NOT NULL);

  INSERT INTO large(data_other) SELECT generate_series(1, 100000000) i;

  SELECT pg_prewarm('large');

yields (best of three)

    postgres[23419][1]=# set jit = 1;
  
  postgres[23419][1]=# EXPLAIN ANALYZE SELECT min(tz), sum(data_double), max(data_double) FROM large WHERE tz > '2018-09-23 14:47:30.298302-07';
  ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
  │                                                                    QUERY PLAN                                                                     │
  ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ Finalize Aggregate  (cost=1495625.42..1495625.43 rows=1 width=24) (actual time=3871.540..3871.540 rows=1 loops=1)                                 │
  │   ->  Gather  (cost=1495625.20..1495625.41 rows=2 width=24) (actual time=3871.388..3873.705 rows=3 loops=1)                                       │
  │         Workers Planned: 2                                                                                                                        │
  │         Workers Launched: 2                                                                                                                       │
  │         ->  Partial Aggregate  (cost=1494625.20..1494625.21 rows=1 width=24) (actual time=3856.050..3856.050 rows=1 loops=3)                      │
  │               ->  Parallel Seq Scan on large  (cost=0.00..1417317.00 rows=10307760 width=16) (actual time=98.533..2216.673 rows=33333333 loops=3) │
  │                     Filter: (tz > '2018-09-23 14:47:30.298302-07'::timestamp with time zone)                                                      │
  │ Planning Time: 0.041 ms                                                                                                                           │
  │ JIT:                                                                                                                                              │
  │   Functions: 8                                                                                                                                    │
  │   Generation Time: 0.934 ms                                                                                                                       │
  │   Inlining: true                                                                                                                                  │
  │   Inlining Time: 5.145 ms                                                                                                                         │
  │   Optimization: true                                                                                                                              │
  │   Optimization Time: 55.719 ms                                                                                                                    │
  │   Emission Time: 43.121 ms                                                                                                                        │
  │ Execution Time: 3874.702 ms                                                                                                                       │
  └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
  (17 rows)
  
  
  postgres[23419][1]=# set jit = 0;
  
  
  postgres[23419][1]=# EXPLAIN ANALYZE SELECT min(tz), sum(data_double), max(data_double) FROM large WHERE tz > '2018-09-23 14:47:30.298302-07';
  ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
  │                                                                    QUERY PLAN                                                                    │
  ├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ Finalize Aggregate  (cost=1495625.42..1495625.43 rows=1 width=24) (actual time=5214.622..5214.622 rows=1 loops=1)                                │
  │   ->  Gather  (cost=1495625.20..1495625.41 rows=2 width=24) (actual time=5214.468..5216.057 rows=3 loops=1)                                      │
  │         Workers Planned: 2                                                                                                                       │
  │         Workers Launched: 2                                                                                                                      │
  │         ->  Partial Aggregate  (cost=1494625.20..1494625.21 rows=1 width=24) (actual time=5179.552..5179.552 rows=1 loops=3)                     │
  │               ->  Parallel Seq Scan on large  (cost=0.00..1417317.00 rows=10307760 width=16) (actual time=0.012..2671.871 rows=33333333 loops=3) │
  │                     Filter: (tz > '2018-09-23 14:47:30.298302-07'::timestamp with time zone)                                                     │
  │ Planning Time: 0.111 ms                                                                                                                          │
  │ Execution Time: 5216.123 ms                                                                                                                      │
  └──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
  (9 rows)

Now obviously those queries are somewhat nonsensical, but IMO they're not an unrealistic representation of what kind of operations happen in analytics queries. If you disable parallelism / add more aggregates / more filters, the benefit get bigger. If you add more joins etc, the benefits get smaller (because the overhead is in parts that don't benefit from JITing).

anarazel · on Sept 23, 2018

Oops, those included per-row timing overhead. Here's the same without that:

  postgres[24261][1]=# set jit = 1;
  
  postgres[24261][1]=# EXPLAIN (ANALYZE, TIMING OFF) SELECT min(tz), sum(data_double), max(data_double) FROM large WHERE tz > '2018-09-23 14:47:30.298302-07';
  ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
  │                                                         QUERY PLAN                                                          │
  ├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ Finalize Aggregate  (cost=1495625.42..1495625.43 rows=1 width=24) (actual rows=1 loops=1)                                   │
  │   ->  Gather  (cost=1495625.20..1495625.41 rows=2 width=24) (actual rows=3 loops=1)                                         │
  │         Workers Planned: 2                                                                                                  │
  │         Workers Launched: 2                                                                                                 │
  │         ->  Partial Aggregate  (cost=1494625.20..1494625.21 rows=1 width=24) (actual rows=1 loops=3)                        │
  │               ->  Parallel Seq Scan on large  (cost=0.00..1417317.00 rows=10307760 width=16) (actual rows=33333333 loops=3) │
  │                     Filter: (tz > '2018-09-23 14:47:30.298302-07'::timestamp with time zone)                                │
  │ Planning Time: 0.121 ms                                                                                                     │
  │ JIT:                                                                                                                        │
  │   Functions: 8                                                                                                              │
  │   Inlining: true                                                                                                            │
  │   Optimization: true                                                                                                        │
  │ Execution Time: 2440.328 ms                                                                                                 │
  └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
  (13 rows)
  
  Time: 2440.891 ms (00:02.441)
  
  postgres[24261][1]=# set jit = 0;
  
  postgres[24261][1]=# EXPLAIN (ANALYZE, TIMING OFF) SELECT min(tz), sum(data_double), max(data_double) FROM large WHERE tz > '2018-09-23 14:47:30.298302-07';
  ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
  │                                                         QUERY PLAN                                                          │
  ├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ Finalize Aggregate  (cost=1495625.42..1495625.43 rows=1 width=24) (actual rows=1 loops=1)                                   │
  │   ->  Gather  (cost=1495625.20..1495625.41 rows=2 width=24) (actual rows=3 loops=1)                                         │
  │         Workers Planned: 2                                                                                                  │
  │         Workers Launched: 2                                                                                                 │
  │         ->  Partial Aggregate  (cost=1494625.20..1494625.21 rows=1 width=24) (actual rows=1 loops=3)                        │
  │               ->  Parallel Seq Scan on large  (cost=0.00..1417317.00 rows=10307760 width=16) (actual rows=33333333 loops=3) │
  │                     Filter: (tz > '2018-09-23 14:47:30.298302-07'::timestamp with time zone)                                │
  │ Planning Time: 0.041 ms                                                                                                     │
  │ Execution Time: 4140.116 ms                                                                                                 │
  └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
  (9 rows)
  
  Time: 4140.499 ms (00:04.140)

anarazel · on Sept 23, 2018

FWIW, the non-parallel version yields a profile like:

  -   25.69%  postgres  jitted-24373-18.so             [.] evalexpr_3_3
     - 96.00% evalexpr_3_3
          ExecAgg
     + 2.89% __isnan
  -   12.15%  postgres  jitted-24373-16.so             [.] evalexpr_3_0
     - 99.38% evalexpr_3_0
          ExecSeqScan
          ExecAgg
     + 0.56% ExecSeqScan
  -   11.74%  postgres  postgres                       [.] heap_getnext
       99.83% heap_getnext
          ExecSeqScan
          ExecAgg
  +    9.31%  postgres  postgres                       [.] ExecSeqScan
  +    7.66%  postgres  libc-2.27.so                   [.] __memmove_avx_unaligned_erms
  +    5.61%  postgres  postgres                       [.] heapgetpage
  +    5.36%  postgres  postgres                       [.] ExecAgg
  +    5.35%  postgres  postgres                       [.] MemoryContextReset
  +    4.11%  postgres  postgres                       [.] HeapTupleSatisfiesMVCC
  +    3.53%  postgres  postgres                       [.] ExecStoreTuple
  +    2.55%  postgres  postgres                       [.] hash_search_with_hash_value
  +    1.63%  postgres  libc-2.27.so                   [.] __isnan
  +    1.16%  postgres  postgres                       [.] CheckForSerializableConflictOut

so it's pretty clear that there's plenty additional work can be done to improve the generated code (it's pretty crappy right now), and that there's potential around improving things due to batching (mainly for increased cache efficiency).

lngnmn1 · on Sept 23, 2018

Love Pirsig's analogies and can't resist to point out that a "classic approach" will always beat an ignorant hype-driven bullshit in the long run.

Informix (now dead?) vs. Oracle is another classic example from the past.

Oracles's domination absolutely does not imply its technical superiority, actually popularity is usually a bad thing (hello, Ayn Rand) - Java and junk food are insanely popular.

lngnmn1 · on Sept 23, 2018

It is useless to cowardly downvote an unpopular opinion without giving any reasons or because you don't like the style of a comment. Cosplay of political-correctness huts the truth and sometimes a form (wording) matters a lot.

Well-researched thing will always beat fast-coded-to-market crap. Erlang (despite all its syntactic ugliness), Go (especially stdlib and runtime, which is hard), Scala, Redis, nginx, Haskell (monad-madness aside), Clojure etc, etc.

Ability to produce a lot of crappy code quickly is not a substitute for a decent academic research with proper attention to details. This is the point. Postgres was a research vehicle in the past, now polished to a state-of-the-art product, just like SQLite.

How is MongoDB, by the way? Still a default choice or it has been recently obsoleted by a blockchain?