Hacker News new | past | comments | ask | show | jobs | submit login

While everybody is going to be rightfully excited about the logical replication, for me personally, CREATE STATISTICS and the new ROW syntax for UPDATE amount to the additions that have the probably biggest effect on me ever since I moved to postgres exclusively when 7.1 was released.

Especially CREATE STATISTICS (wonderful explanation here https://www.postgresql.org/docs/10.0/static/multivariate-sta...) is the one single knob I wanted to be able to turn so many times in the past now (especially the n-distinct counts).

Most of the time, the planner does an ok job, but sometimes you have tables in a peculiar shape and if you're unlucky, the planner will opt into horribly bad plans. Whenever this happened for me, it was due to one of the two things CREATE STATISTICS allows me to tune in the future.

Thank you, thank you, thank you to whoever gave us this wonderful feature




Like you, I think CREATE STATISTICS is huge. I work with a sharded PostgreSQL set-up where we roll our own sharding based on customer data. This means that most of our tables have compound primary keys where the identifying account data is part of the identifier.

Just speculating off hand, but this sounds like a schema-defined column dependency which is doubly-troublesome since, like I said, this is within our primary key index. I am super excited to see just how much a difference CREATE STATISTICS will improve overall performance.


It's difficult to say whether the extended statistics could help with your schema, particularly in Pg10 where we only implemented two simple statistics types - functional dependencies and ndistinct (GROUP BY) coefficients.

Maybe the changes in 9.6 that allow using foreign key constraints during estimation would help, though?


Has anything actually changed with the ROW syntax for UPDATE? Maybe I'm missing something but it looks like the functionality has always been there, but now you're allowed to type the word "ROW". E.g. compare [1] and [2].

[1] https://www.postgresql.org/docs/9.6/static/sql-update.html [2] https://www.postgresql.org/docs/10.0/static/sql-update.html


What's the significance of the `update .. set (..) = row (..) ..` syntax?

  update comment set modified = now(), body = 'edited comment' where id = 123;
vs

  update comment set (modified, body) = row (now(), 'edited comment') where id = 123;
It doesn't seem to provide any new functionality, just a minor difference in syntax.


EDIT: Turns out the below is not true, though you can achieve the same effect with the sub-select syntax.

The row value can be a single value from some other query. (Rather than having to pick apart each column from the row value.) That said I think the feature has been there for a while, and now simply the "ROW" keyword is optionally allowed.


Thanks for the explanation!

  update comment set (modified, body) = (select now(), 'edited comment') where id = 123;
You're right, it works the same without the `row` keyword in 9.6.


Actually now that I re-read the grammar, sub-selects are yet another accepted syntax. Looks like the ROW syntax doesn't support row expressions like I thought. Not really sure the benefit beside making it easier to programmatically construct UPDATEs.


Here's the commit where the "UPDATE .. SET ROW (col, ..) = (col, ..)" syntax was added as an alternative to "UPDATE .. SET (col, ..) = (col, ..)": https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...

I don't quite understand the commit message, but I think the new syntax resolves some ambiguity in the grammar in an edge case. From the diff of src/test/regress/sql/update.sql:

  -- *-expansion should work in this context:
  UPDATE update_test SET (a,b) = ROW(v.*) FROM (VALUES(21, 100)) AS v(i, j)
    WHERE update_test.a = v.i;
  -- you might expect this to work, but syntactically it's not a RowExpr:
  UPDATE update_test SET (a,b) = (v.*) FROM (VALUES(21, 101)) AS v(i, j)
    WHERE update_test.a = v.i;


It's for when you want to update more than one column with values from the same record in another table, eg:

  UPDATE books
    SET (title, isbn) = (
    SELECT title, isbn FROM other_books
  WHERE other_books.foo = books.bar);

You can also do it with UPDATE ... FROM (which may be more efficient), but that's a PostgreSQL extension, while the added-in-9.5 syntax is SQL standard.

  UPDATE books
    SET title = other_books.title, isbn=other_books.isbn
    FROM other_books
    WHERE other_books.foo = books.bar;


That's not the row syntax though, that's the sub-select syntax (my initial confusion). i.e., there are three options according to the grammar:

  SET { column_name = { expression | DEFAULT } |
        ( column_name [, ...] ) = [ ROW ] ( { expression | DEFAULT } [, ...] ) |
        ( column_name [, ...] ) = ( sub-SELECT ) }
The second option, "row" syntax (to which the optional "ROW" keyword was recently added), doesn't allow for a row expression, only column expressions. The sub-select syntax suffices for row expressions (as your example demonstrates).


TIL too - works in 9.5 as well it seems: http://dbfiddle.uk/?rdbms=postgres_9.5&fiddle=57e112a297ef86...


Looks like a purely cosmetic change. Scroll down to the bottom of https://www.postgresql.org/docs/10.0/static/sql-update.html to see the only related difference from https://www.postgresql.org/docs/9.6/static/sql-update.html .


I think it might also simplify the syntax if you want to update multiple columms from a composite type (which is essentially a ROW) so that writing sub query is not needed anymore.


As exciting as this is (and it really is) it's yet another thing that explodes the number of possibilities for optimization. It's getting towards the point of unmanageability and I hope there will be a movement towards, if not auto-tuning databases, assistive tools for exploring both this and indexing possibilities.


Yes, at this point it's manual optimization task - you have to decide which statistics you need (e.g. by looking for poor estimates in query plans) and then create them. Pretty much just like indexes.

The ultimate goal is to allow detecting this automatically, and either create the statistics right away or at least providing some suggestions to the users. That's not in Pg10 though.


I think the best way to handle this situation is for the default install to be generally performant on any given task, with the option to flip switches for those use cases where extreme performance are needed due to the specific workload.

If that's the case (which I suspect it is, having used prior PSQL versions), then most people will never need to optimize it, but those who have an extreme use case that can benefit from such tweaks will have such options available.


CMU's DBMS is trying to do this: http://pelotondb.io




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: