
New In Postgres 12: Generated Columns - craigkerstiens
https://pgdash.io/blog/postgres-12-generated-columns.html
======
aidos
This is great news. Those weird little cases where you need to either defer
the computation until read time, or precompute and store yourself always felt
warty and ripe for errors. Really glad to see this addition.

EDIT an example from this week: we have a json blob full of stuff and we want
to pluck out a specific field to search on. You need to jump through casting
hoops to hoist it out as an integer when you query. It’s doable, but the onus
is on everyone who queries to understand this.

~~~
AhtiK
This JSON plucking is also much easier now with the json path expressions
coming up in v12!

~~~
rattray
Can anyone share a link?

~~~
onion2k
[https://www.postgresql.org/docs/12/functions-
json.html](https://www.postgresql.org/docs/12/functions-json.html)

------
mopierotti
This seems like a cool feature, but would I be correct in thinking that:

1) If you're only using the generated column for filtering/sorting rows, you'd
be better off using indexes on expressions?
([https://www.postgresql.org/docs/current/indexes-
expressional...](https://www.postgresql.org/docs/current/indexes-
expressional.html))

2) Therefore, if you're instead interested in returning the generated columns'
values, this feature would be useful in proportion to how expensive the
expression you're using is, because you're saving time by precomputing the
column rather than computing at query time.

Edit: I can also see the benefit of removing the burden on the person
performing the query to have to remember the details of the expression, or in
the server case, not having to duplicate the expression across code bases.

~~~
combatentropy
> If you're only using the generated column for filtering/sorting rows, you'd
> be better off using indexes on expressions?

Good point!

> I can also see the benefit of removing the burden on the person performing
> the query to have to remember the details of the expression, or in the
> server case, not having to duplicate the expression across code bases.

You can do that also with a database view or function, which is my preference.

I prefer my tables to be fully normalized. Any computation or processing, I
try to keep in views. This just helps my mind. Tables = hard data. Views =
processed data. But maybe I'm just set in my ways. Calculated columns are in
the SQL standard, after all, and have been implemented in other databases for
some time. In special cases (heavy calculation + heavy reads) of course this
feature makes a bit more sense.

~~~
eropple
I think your approach makes more sense when thinking about databases directly.
If you're using an ORM, you can use a view but it's kind of awkward. I see
this as being more useful in that ORM universe--it just ends up being a read-
only field on your model.

~~~
dragonwriter
> If you're using an ORM, you can use a view but it's kind of awkward

If it's awkward to use a view in an ORM, it's a bad ORM. Your ORM shouldn't
care if a relvar is a table or a view. (It obviously might care if it's
updatable or not, but updatable views—both automatically updatable and
updatable via specific trigger programming—arw a common thing, as are read-
only base tables.)

~~~
eropple
I dunno, I find using relations I'm not supposed to modify pretty odd in an
ORM. And YMMV, but I've never seen an updatable view in the wild. I know it's
doable, particularly in Postgres, but it seems like something capital-S
Surprising to...probably most folks I've ever worked with?

------
Ididntdothis
I tend to follow the Postgres releases and I am always impressed by the cool
things SQL databases can do. But for programmers like me who are used to code
in C++/C# I always find the transition from these languages to SQL too harsh.
Especially if you don’t have to do SQL daily it’s really hard to remember the
syntax and read complex SQL code. Also the transition from SQL results to
typed languages is tedious. ORMs help a little but they have their own set of
problems.

Not sure what I am trying to say other than that I wish it was easier for
regular programmers to use advanced SQL features...

~~~
dewey
I found this book to be very helpful, I'm not done yet but so far I'm really
enjoying it.

"Perfectly intelligent programmers often struggle when forced to work with
SQL. Why? Joe Celko believes the problem lies with their procedural
programming mindset, which keeps them from taking full advantage of the power
of declarative languages. The result is overly complex and inefficient code,
not to mention lost productivity."

[https://www.amazon.com/Joe-Celkos-Thinking-Sets-
Management/d...](https://www.amazon.com/Joe-Celkos-Thinking-Sets-
Management/dp/0123741378)

~~~
Ididntdothis
I think it’s not only the procedural vs declarative difference but also just
the plain syntax. To me SQL is just hard on the eyes. It feels a little like
FORTRAN in the good old days.

~~~
misterdoubt
Well it doesn't help that people continue using ALL CAPS for SQL keywords as a
preferred style. In this century.

~~~
zo1
I find that it helps differentiate the different components of the query. Same
way we use all-caps for constants, etc. If it's lower-case, or camel-case,
then it just sort of "melts" into the rest of the query.

~~~
pletnes
Not with syntax highlighting.

~~~
preommr
The tooling around sql is pretty awful. I was looking for a vscode plugin to
get some syntax highlighting and linting for postgres yesterday, and just gave
up. There are plenty of administrative tools that help with connecting to dbs
and showing the results in a nice table, but nothing that really helps with
writing it (or at least nothing of high quality that I could find).

~~~
dewey
[https://www.jetbrains.com/datagrip/](https://www.jetbrains.com/datagrip/) and
Postico are pretty great for PG I think. I'm using them every day and having
the option to attach DataGrip to a directory of queries in your code and
directly executing them from there with set variables is pretty sweet.

It's helping you write queries by auto formatting and good auto completion.

------
Mister_Snuggles
The first database I used professionally was called UniVerse[0], currently
owned by Rocket Software. It's a Pick-style, non-relational database.

In the data dictionary for a file you could create I-descriptors, which were
computed columns much like this feature allows. The difference is that
I-descriptors were always calculated on the fly and they could do a LOT more
than PostgreSQL's generated columns.

These were commonly used to accomplish things that SQL would use a JOIN to do,
mainly because the query language didn't have joins. An INVOICES file, for
example, would have fields like CUSTOMER.NAME, CUSTOMER.ADDRESS, etc, (note
that the '.' is just another character in the field name, it doesn't actually
mean anything to the database) which would pull the relevant information from
the customer file or call a subroutine to find the relevant information (e.g.,
in a history file).

The results of the I-descriptor don't have to be stable - they can be
calculated based on the current date/time, random numbers, data in other
files, etc. This leads to some interesting possibilities that I don't think
PostgreSQL's implementation can touch. It also leads to some interesting
gotchas.

I don't have a good reference handy for UniVerse's I-descriptors, but the
System Description document[1] has a section on it.

It had a certain elegance that I miss in modern SQL databases. On the flip
side, modern SQL databases are so much more powerful.

[0] [https://www.rocketsoftware.com/products/rocket-
universe-0/ro...](https://www.rocketsoftware.com/products/rocket-
universe-0/rocket-universe)

[1]
[https://docs.rocketsoftware.com/nxt/gateway.dll/RKBnew20%2Fu...](https://docs.rocketsoftware.com/nxt/gateway.dll/RKBnew20%2Funiverse%2Fprevious%20versions%2Fv11.1.15%2Funiverse_systemdescriptionuserguide_v11115.pdf)

~~~
dragonwriter
> The difference is that I-descriptors were always calculated on the fly and
> they could do a LOT more than PostgreSQL's generated columns.

So, just like columns in a view.

> These were commonly used to accomplish things that SQL would use a JOIN to
> do, mainly because the query language didn't have joins.

Yeah, in SQL you could do that with a correlated subquery in a column
definition in a view, instead of a join, but it's generally not optimal (I've
seen people do it, though not recently.) There's other uses for correlated
subqueries besides being the inefficient way to do joins, and they, plus
normal functions, let you calculate just about anything you might want in a
columns in a view.

> The results of the I-descriptor don't have to be stable - they can be
> calculated based on the current date/time, random numbers, data in other
> files, etc. This leads to some interesting possibilities that I don't think
> PostgreSQL's implementation can touch.

The calculated columns one that supports only immutable functions because the
values are materialized which doesn't make sense otherwise can't, but then on-
the-fly calculation makes reads expensive. If you need that, though, Postgres
supports it in views since (approximately) forever, which are the classic SQL
approach to the problem.

~~~
Mister_Snuggles
Yes, for the common case a view does the trick.

However, I-Descriptors have the power of the underlying UniVerse BASIC runtime
at their disposal, so you can do a LOT of things. I mention one of these in
another comment[0].

However, as I mentioned, doing it this way isn't necessarily a good idea.

[0]
[https://news.ycombinator.com/item?id=21137935](https://news.ycombinator.com/item?id=21137935)

~~~
dragonwriter
Postgres views have the power of the installed procedural language runtimes
behind them; its pretty common this includes an unrestricted Python
interpreter, among others.

------
est
[https://dev.mysql.com/doc/refman/5.7/en/create-table-
generat...](https://dev.mysql.com/doc/refman/5.7/en/create-table-generated-
columns.html)

I am a bit surprised nobody mentioned MySQL 5.7 had Generated Columns yet. Any
difference between these two?

~~~
atombender
I've not studied the documentation in detail, but MySQL looks pretty good
here. It allows partitioning on a generated column, and Postgres currently
does not.

------
Erwin
An alternate way of doing this is to pass the entire current table row to a
function which can be done easily: if you have a table "purchase" PG also
creates a "purchase" type, so if you have this function:

    
    
        CREATE FUNCTION vat(a purchase) RETURNS numeric AS 'SELECT a.value * .25' LANGUAGE 'sql'
    
    

Then you can run:

    
    
        SELECT value, vat(purchase.*) FROM purchase;
    

And so be able to use every purchase column within the SQL function to do your
calculation.

There are two interesting shortcuts: You can call just vat(purchase) because
the type is an alias for the current table row. That alias is very confusing
and this is not recommended (try select table from table!)

There's also a method like shortcut which the documentation calls "functional
notation":

    
    
        SELECT value, purchase.vat FROM purchase;
    

This calls your VAT() function passing it the entire purchase row, see
[https://www.postgresql.org/docs/11/rowtypes.html#ROWTYPES-
US...](https://www.postgresql.org/docs/11/rowtypes.html#ROWTYPES-USAGE)

I don't actually know if PG can correctly inline the necessary code in here --
it is better able to do it if you use "SQL" function certainly.

~~~
philliphaydon
This isn’t a good example because you wouldn’t pre calculate vat and store it
for product listing, as vat doesn’t apply to all countries and it’s subject to
change, and differ between countries. (Japan just changed gst yesterday)

You also wouldn’t want to run a function on something you need to filter
against. To give you an example we have the concept of a “deadline” date which
is based on the time the record is stored + a period of time which it must be
completed by.

Calculating that column in sql before doing a where filter is crazy slow when
you’re looking at millions and millions of records. But pre-calculating it and
storing it, and then adding an index on top of it, is insanely fast.

This is currently done in code. But if I moved this to a computed column then
I can Ensure the result is always up to date if the period changes and avoid
code being written to accidentally forget to update this value.

There are use cases for computer columns. As there are for functions. And
doing it in code.

This feature in pg12 mainly gives us the ability to index the value which we
couldn’t do before.

~~~
taffer
I'm not sure if I understand you correctly, but logically the WHERE clause
happens before the SELECT clause[1], so it only calculates this value for the
rows you're interested in. It is also possible to index functions without
generated columns.

[1] [https://blog.jooq.org/2016/12/09/a-beginners-guide-to-the-
tr...](https://blog.jooq.org/2016/12/09/a-beginners-guide-to-the-true-order-
of-sql-operations/)

~~~
philliphaydon
The first part I’m saying is a bad example because you wouldn’t store the
price + vat in a column let alone a calculated column.

The second part I’m giving an example where doing:

where created + period > now() - '3 days'::interval

Having to calculate the value in a where clause is inefficient.

Making a calculated column adding created and period then indexing it is more
efficient.

------
makach
At first, I thought; /hey pretty cool feature!/

But after contemplating it, is this really necessary? I fear for putting
business logic and meanings into the wrong the layer; There are use and abuse,
and my consideration fears the latter.

~~~
stupidcar
Why do you think placing business logic into the database system is a layering
violation?

Just because most developers use their DB as a dumb store doesn't mean it
needs to be. There are also plenty of successful software systems that place
the majority of their business logic and use a generic programming language
and runtime only for the presentation layer.

If you're comfortable fully exploiting the capabilities of your DB, then the
intelligent combination of a relational model with custom data types,
constraints, triggers, views and stored procedures can make the DB the perfect
place to implement business logic.

~~~
atroche
Agreed. After all, if you wanted to keep all "business logic" out of the db
you wouldn't even use foreign key constraints.

~~~
arethuza
You probably wouldn't use multiple tables or multiple columns either and just
have a single table that stores document-like rows... which vaguely reminds me
of something.

~~~
dtech
PostgreSQL is a pretty good document store, and when you need it you have the
relational model available and integrated.

------
stockkid
This is indeed very nice for maintaining things such as indices for full text
search. Solutions based on triggers never felt easier to maintain because
triggers were one step removed from the table itself.

------
tirumaraiselvan
Why do they call it Generated Columns and not Computed Columns like most other
databases already do?

~~~
PeterisP
IMHO "computed columns" generally implies columns that are computed on-demand
when read, as opposed to this case when they're computed when the row is
altered and persisted in the table.

------
hestefisk
This is really nice. Acts almost as a materialised view.

~~~
yourad_io
I think those have to be explicitly refreshed, no?

~~~
yourad_io
Ah - they're are two types of generated columns

> There are two kinds of generated columns: stored and virtual. A stored
> generated column is computed when it is written (inserted or updated) and
> occupies storage as if it were a normal column. A virtual generated column
> occupies no storage and is computed when it is read. Thus, a virtual
> generated column is similar to a view and a stored generated column is
> similar to a materialized view (except that it is always updated
> automatically).

[https://www.postgresql.org/docs/12/ddl-generated-
columns.htm...](https://www.postgresql.org/docs/12/ddl-generated-columns.html)

------
hultner
I've been using this for a while in a hobby project, great feature which makes
for neat declarative schemas.

Unfortunately some tooling is choking on this, for instance datagrip crashes
om introspection, but hey that's what we can expect for being early adopters.
:-)

------
laxentasken
Looking through the docs I cannot find the limitations for what you can
generate (e.g. only simple column * column or more "advanced" statements like
you can use in a update statement)?

------
yourad_io
I wonder if a generated column can be referenced as a foreign key. Weirdness
wouod probably ensue, so I'd guess... No?

~~~
mslot
Yes. A generated column can be included in a foreign key and be referenced by
a foreign key. Nothing too weird happens. It's basically like having every
update and insert specify the column value.

------
72deluxe
Fascinating reading and looks very useful. I will try to get away from SQL
Server for a bit to look at postgresql again.

Thanks

------
ako
Seems like a slightly more convenient way than using a trigger to achieve the
same...

------
jitans
This is one of those features to not use ever, unless you have another problem
that is an application touching directly a table instead of a view.

~~~
icebraining
Why? This does something a (postgres) view cannot. It's useful even if you
have a view in front of the table.

------
iblaine
1% of the people that want this feature are surely thrilled. It is a cool
feature. The other 99% of us are looking at this like a potential land mine
that will show up unsuspectingly when trying to refactor old code, or
migrating from postgres.

~~~
dragonwriter
Generated/computed columns are a fairly common RDBMS feature; MySQL has it
since 5.7; SQL Server, Oracle, and DB2 have them. So, Postgres having them
makes migrating into Postgres easier and migrating out no more difficult
except maybe if you are migrating to SQLite or a nonrelational store.

------
redact207
As a developer I always hated this feature in other rdbms. The idea of your
app saving a row and retrieving it only to have more data in it.

IMO these types of calcs are better done in your app, where you can at least
write a test and assert it's doing the right thing. It also makes it a lot
easier to reason about your code if the logic is in the app rather than bits
of it stuck in column definitions.

Perhaps there are legitimate uses of this, maybe for DBs that aren't just
repositories for apps?

~~~
Ididntdothis
Maybe there is a performance benefit to calculating values once at update time
vs millions of times during query?

~~~
redact207
Wouldn't it be better to do the calc in your app and save it to the db? Best
of both worlds

~~~
yourad_io
So long as everyone, everywhere, always remembers to calculate it in the same
way for all inserts and updates.

This seems a lot more robust to me.

~~~
GordonS
You can use abstractions that push such code out of the developers concern,
into the "code infrastructure" layer.

But there are also plenty times where I'd rather compute at the database side;
there are no silver bullets.

~~~
magicalhippo
And then someone at support comes along and updates the data via SQL and
forgets to update the derivative field because he got woken up at 2am for an
emergency.

Using a computed column ensures the data is consistent.

