EDIT an example from this week: we have a json blob full of stuff and we want to pluck out a specific field to search on. You need to jump through casting hoops to hoist it out as an integer when you query. It’s doable, but the onus is on everyone who queries to understand this.
Relating json/xml API response blobs to queryable columns is a huge and inefficient burden which I hope that these featuers can help to alleviate.
Many places online recommend storing this collection of divided geometries as a materialized view, but I recently had to move it to a separate real table because inserting a single new record would take 15 minutes to update the view (on an RDS 4xlarge database). It could at least update concurrently, so other reads didn't block, but now that the the divided geometries are stored in a separate table I can add new records in under 5 seconds usually.
I know there are things you can do, but having better tools available is always a win (especially when they remove later maintenance from the equation).
> ” Such functionality was earlier usually achieved with triggers, but with generated columns this becomes much more elegant and cleaner.”
I don’t have a link at hand from a discussion on the mailing list, but I’ll take Peter’s word for it.
"Indexes: Generated columns can be used in indexes, but cannot be used as a partition key for partitioned tables."
I also hope it can be used for a partition key in the future.
1) If you're only using the generated column for filtering/sorting rows, you'd be better off using indexes on expressions? (https://www.postgresql.org/docs/current/indexes-expressional...)
2) Therefore, if you're instead interested in returning the generated columns' values, this feature would be useful in proportion to how expensive the expression you're using is, because you're saving time by precomputing the column rather than computing at query time.
Edit: I can also see the benefit of removing the burden on the person performing the query to have to remember the details of the expression, or in the server case, not having to duplicate the expression across code bases.
> I can also see the benefit of removing the burden on the person performing the query to have to remember the details of the expression, or in the server case, not having to duplicate the expression across code bases.
You can do that also with a database view or function, which is my preference.
I prefer my tables to be fully normalized. Any computation or processing, I try to keep in views. This just helps my mind. Tables = hard data. Views = processed data. But maybe I'm just set in my ways. Calculated columns are in the SQL standard, after all, and have been implemented in other databases for some time. In special cases (heavy calculation + heavy reads) of course this feature makes a bit more sense.
Computed columns essentially make the base table into a transparent materialized view over the (noncomputed) “real” base table. But it's closer to an ideal materialized view than actual Postgres materialized views because it self-refreshes on need. But, conceptually, it fills the same role as a matview.
If it's awkward to use a view in an ORM, it's a bad ORM. Your ORM shouldn't care if a relvar is a table or a view. (It obviously might care if it's updatable or not, but updatable views—both automatically updatable and updatable via specific trigger programming—arw a common thing, as are read-only base tables.)
I don't use an ORM. But however you specify a table name in the application code, I imagine just specifying the name of a view instead. The ORM selects, inserts, updates, or deletes with the view as the target, instead of a table. It would not know it was not a table.
Not sure what I am trying to say other than that I wish it was easier for regular programmers to use advanced SQL features...
CSS is a similar phenomenon, you can't approach it as code. The box model is a nice way to get the basics and manipulate the DOM.
Similarly, I think in terms of tables (rows and columns) to visualize SQL operations that I need to do. Every operation is like a matrix operation, so you need to stop thinking in terms of for/loops iterators and think of matrix/table wide operators. Once you've got the basics, then you can look at stuff like window functions to do more advanced operations.
Takes a bit of working with to get used to. That's just how I think about it - you'll need to find a metaphor that works for you!
Declarative languages encompass things like HTML, CSS, and SQL. Here we say what should happen, but not specifically how it should happen.
Imperative languages are C, Python, Java, JS etc. Here we give the precise steps which describe how something should work, and there is an absence of describing what the problem actually is.
Declarative languages give us a lot of power to solve a very narrow range of tasks.
Imperative languages give us comparatively less power, but allow us to tackle a very wide range of tasks.
For example: We could certainly write a Python program to sort, filter, group, and join two lists of data. However, the equivalent SQL code would be much shorter and – all things being equal – much more readable.
Conversely, SQL would be a non-starter for writing a web service or an image processing library. SQL does one thing really well, and that thing is data wrangling.
Mozilla put out a good video about this today: https://www.youtube.com/watch?v=aHUtMbJw8iA
"Perfectly intelligent programmers often struggle when forced to work with SQL. Why? Joe Celko believes the problem lies with their procedural programming mindset, which keeps them from taking full advantage of the power of declarative languages. The result is overly complex and inefficient code, not to mention lost productivity."
It's helping you write queries by auto formatting and good auto completion.
Also, leading a new line with the comma.
In all the algebra classes I've taken, one nice thing about sets is you can arrange them in any order you like, and you'll get the same answer any way you slice it. There's no inherently correct orientation of mathematics. That's kind of the point. It's fully generic, by default. "x" can mean anything (at least, in the algebraic structure we've assumed).
With SQL, the experts drone "think in sets!" as a way to mean "you happened to pick a different order than me, so your query will run 1000 times slower". Well, mine is still sets. I'm not writing a for-loop here (and I'm pretty sure my SQL dialect has those by now).
I've actually had the most luck with SQL by thinking about it in terms of looping first. Figure out the most efficient loop over your biggest table, and then write a query in a way that makes it easy for the optimizer to loop over that.
Then again, nobody ever accused me of being "perfectly intelligent".
PostgreSQL has great documentation, and I will give it a serious read one day.
The term is usually applied to mapping from object-oriented to relational systems and vice versa but it’s perhaps more of or also a mismatch between imperative and declarative programming styles.
I think we just have to accept that these two are different and that each is great at what it does instead of trying to shoehorn a relational approach into object-oriented and imperative ones.
I have no idea what all those modifier symbols do in Rust, say, but I find "case when InvoiceNo is null then" or "select Name, list(distinct Title) as Titles from" pretty transparent in comparison.
The biggest mental difference I think is that you're dealing with a different data model. You don't have a bunch of individual entities floating around. You got rows in tables, which you mostly process in bulk. As such it's more like an Excel spreadsheet.
I also wonder if the perceived verbosity is caused by some programmers cramming logic into the queries, rather than keeping them simple and processing the results in the client language.
The only case where things get really verbose for me is when I need to do subqueries for columns. Often I might need to run basically the same subquery but return aggregates over different fields, ie sum of net weight, sum of gross weight, sum of value. For performance reasons I can't write that as a join, at least on our SQL server.
I agree there should be a shortcut syntax for joining using foreign keys ("join X on foreign key"?) though.
Whats needed is a really good object database for C# to eliminate the impedance mismatch.
Don't get me wrong SQL is great, but to be useful it needs to be mixed with a procedural language (pl/SQL, T-SQL, etc) I would just prefer using a more capable typed procedural language instead.
But this is quite a niche usage of PG and there are little to zero tools supporting it.
At this point, I would say Postgres itself is a tool that doesn't support Postgres inheritance.
I personally do not find useful to execute .NET code in PG, but technically you could do that.
As long as you can translate LINQ expressions to PG as you can most of them with Revenj you can avoid the pitfalls of mainstream ORMs.
The ability to run procedural code on the db server allow for much better performance than sending data across the network to the app server for processing. Being able to share procedural code between app server and db allows for the choice to be made easily based on the best place to run it rather than whether it might have to be rewritten to move from one to the other. I suppose a plugin could be made for .Net in PG similar to plV8, SQL server has had .Net stored procedures some time with many issue and caveats so it doesn't get much use.
The problem with Linq and what makes it so nice to use is the ability to mix procedural code in the Linq statement ex: list.Where(x=>x.SomeMethod()) Very easy to make something that can't be translated and then starts streaming the entire intermediate result to the app server for processing SomeMethod that would be much more performant in the db tier local to the data.
I am thinking more along the lines of an embedded db like SQLite but does native serialization, indexing and query optimization against .Net objects and types.
Then to make a "database server" is really just an app server running your c#. If you really wanted to make it like pg then you would have an app server that accepts code snippets, compiles runs and returns results. Of course you would need proper sandboxing like .net fiddle. You could also do something in between like serializing Linq expression trees to send them to server for processing.
More than 20 years ago I actually used it as a true application server, where the database application layer would even generate all the HTML code required to displayed your app ui.
And with database schemas you have proper sandboxing.
The only downside is that the only interface to your data is through the application, but there's certainly a use-case for something like that.
It’s stores it as jsonb. So you don’t need to worry about weird relationships you only worry about your root aggregate. You can index it. Apply full text search. And persisted columns in the map if for convince or speed.
So Yeah it maps the linq to the equiv sql for you. But allows you to work with your object and worry less about persistence. Cos you can add and remove properties. It’s the closest thing I’ve come to, to being able to forget about the database and focus on code.
I Guess unless you write your own database from the ground up so it supports the language as a feature without dropping back to something else. It won’t happen.
However the mismatch is always there, the overhead of mapping types, inefficient serialization (key names repeated over and over in json vs a real schema), certain statements cannot be transpiled to sql so you have pathologic cases at unexpected times, in ability to run any statement you want in the db tier to take advantage of locality and so on.
Most everyone wants a object database since most people are working with objects which is why ORM are so popular.
Closet thing I ever saw was db4o.
This really adds up on a large database. Mongo used to recommend short key names and I even think some client would translate them based on some definition to try and save disk/memory/io
https://github.com/postgrespro/zson tries to do key compression.
Also its not that Linq can't do what you want, its that if you do certain things that can't be translated to SQL the whole intermediate result will be retrieved from the db, deserialized then the linq statement will be run in memory on the client. Its like putting ToList in the middle of your linq statement where a maybe a few million rows might need to be processed to give the very small result.
If the linq statement was being run in process on the db server at least it would happen local to the data and would be more akin to a table scan vs index, still not great but much better then sending all the results to the client for processing.
What I found helpful was to prototype all the migrations and queries (of whatever feature or user flow you're building) in a .sql file first with mocked values. You'll be surprised by how much you can do if you're simply forced to write pure sql. Once you've reached the limit of what you can do you can move the queries back into your app to wire the rest up. This is usually not much work if you use raw queries instead of ORMs. You usually end up writing a lot more efficient apps this way!
You can take your SQL knowledge to another platform, but less so with say Entity Framework or LINQ. Don't get locked in. Plus, if the ORM doesn't do your query correctly or as intended, someone will need to understand SQL anyhow to debug it.
We have so many ORMs and most often they look more ugly than pure SQL to me, especially when it comes to more complex queries.
You could write a translator (something like coffeescript -> JS but X -> SQL), but can you really come up with a syntax that's enough of an improvement that it would be worth using it?
I also love ORMs. But perhaps my knowledge of SQL has enhanced my ability to use them well. Without them you end up with numerous other problems which I’d consider worse than the occasional compromises ORMs require. Ideally you want all your business logic in one place, or at least in discrete units, so the fragmentation of stored procedures is a major downside. Embedding SQL directly into other business logic is like and uglier version of an FFI, and serializing SQL statements is even more of a security concern than marshalling FFIs (something that can already be a bit of a footgun).
I’d recommend any engineer to learn SQL, but for people who use ORMs a lot already, I’d suggest to start with understanding explain plans first.
select foo, bar from baz
from baz select foo, bar
Maybe? Seems like ReQL itself will live on, even if not the whole RethinkDB: https://github.com/rethinkdb/rethinkdb/issues/6747
Today if you asked me to design an object model for software to model objects, I wouldn't end up with anything remotely like C++/C# has.
It's a fundamentally different way of thinking.
With imperative programming, the mindset is: "I have this certain outcome in mind. What is the sequence of steps I need to take to make it happen?"
In a relational database, the mindset you need to be in is: "How can I define the relational structure of my world (i.e. my data) so that I can get the outcomes I want - and end up saving a great deal of imperative programming effort in the longer run?"
Quite different. And yes, it takes a while to get used to.
In the data dictionary for a file you could create I-descriptors, which were computed columns much like this feature allows. The difference is that I-descriptors were always calculated on the fly and they could do a LOT more than PostgreSQL's generated columns.
These were commonly used to accomplish things that SQL would use a JOIN to do, mainly because the query language didn't have joins. An INVOICES file, for example, would have fields like CUSTOMER.NAME, CUSTOMER.ADDRESS, etc, (note that the '.' is just another character in the field name, it doesn't actually mean anything to the database) which would pull the relevant information from the customer file or call a subroutine to find the relevant information (e.g., in a history file).
The results of the I-descriptor don't have to be stable - they can be calculated based on the current date/time, random numbers, data in other files, etc. This leads to some interesting possibilities that I don't think PostgreSQL's implementation can touch. It also leads to some interesting gotchas.
I don't have a good reference handy for UniVerse's I-descriptors, but the System Description document has a section on it.
It had a certain elegance that I miss in modern SQL databases. On the flip side, modern SQL databases are so much more powerful.
So, just like columns in a view.
> These were commonly used to accomplish things that SQL would use a JOIN to do, mainly because the query language didn't have joins.
Yeah, in SQL you could do that with a correlated subquery in a column definition in a view, instead of a join, but it's generally not optimal (I've seen people do it, though not recently.) There's other uses for correlated subqueries besides being the inefficient way to do joins, and they, plus normal functions, let you calculate just about anything you might want in a columns in a view.
> The results of the I-descriptor don't have to be stable - they can be calculated based on the current date/time, random numbers, data in other files, etc. This leads to some interesting possibilities that I don't think PostgreSQL's implementation can touch.
The calculated columns one that supports only immutable functions because the values are materialized which doesn't make sense otherwise can't, but then on-the-fly calculation makes reads expensive. If you need that, though, Postgres supports it in views since (approximately) forever, which are the classic SQL approach to the problem.
However, I-Descriptors have the power of the underlying UniVerse BASIC runtime at their disposal, so you can do a LOT of things. I mention one of these in another comment.
However, as I mentioned, doing it this way isn't necessarily a good idea.
UniVerse, for example, gained proper transaction support at some point in its lifetime. When I showed the consultant developer, an old-school Pick guy, how that worked he was very impressed.
Granted, it was fairly limited since they somewhat faked it using the regular locking mechanisms and it was only usable within UniVerse BASIC, but it did what it said on the tin. I ended up writing my own simple command interpreter which allowed me to use the UniVerse command prompt and BEGIN/COMMIT/ROLLBACK transactions.
Transactions are definitely foreign to a LOT of Pick code though. I also worked on an ERP system that ran on a Pick-style system - if it crashed during GL posting it was completely possible, and this did happen frequently, that you would get a half-posted entry and an out-of-balance ledger. Had they just used the transactions that the underlying database supported they would not have had that problem.
The equivalent in the SQL world is to use your database connection in autocommit mode, which is pretty rare. Most developers know to use transactions or work with a framework that intelligently uses transactions.
I-Descriptors can call subroutines, which opens them up to doing anything that the underlying UniVerse BASIC runtime can do.
I once made an I-Descriptor which called a subroutine to feed the postal code into a web service that returned the geographic location of that postal code. It also implemented caching so it would only call the service for a postal code once. You could simply list out the file and find customers in a particular geographic region just using the query language.
I'm sure this is do-able using something like PL/Python, but I suspect it would be much more difficult to do it all in-database than how I did it in UniVerse.
Whether doing it this way is a good idea is, of course, up for debate. In a modern application I wouldn't dream of doing this on the fly in a view. Just because I could build it the way I did doesn't mean it was a good idea.
I am a bit surprised nobody mentioned MySQL 5.7 had Generated Columns yet. Any difference between these two?
CREATE FUNCTION vat(a purchase) RETURNS numeric AS 'SELECT a.value * .25' LANGUAGE 'sql'
SELECT value, vat(purchase.*) FROM purchase;
There are two interesting shortcuts: You can call just vat(purchase) because the type is an alias for the current table row. That alias is very confusing and this is not recommended (try select table from table!)
There's also a method like shortcut which the documentation calls "functional notation":
SELECT value, purchase.vat FROM purchase;
I don't actually know if PG can correctly inline the necessary code in here -- it is better able to do it if you use "SQL" function certainly.
You also wouldn’t want to run a function on something you need to filter against. To give you an example we have the concept of a “deadline” date which is based on the time the record is stored + a period of time which it must be completed by.
Calculating that column in sql before doing a where filter is crazy slow when you’re looking at millions and millions of records. But pre-calculating it and storing it, and then adding an index on top of it, is insanely fast.
This is currently done in code. But if I moved this to a computed column then I can Ensure the result is always up to date if the period changes and avoid code being written to accidentally forget to update this value.
There are use cases for computer columns. As there are for functions. And doing it in code.
This feature in pg12 mainly gives us the ability to index the value which we couldn’t do before.
The second part I’m giving an example where doing:
where created + period > now() - '3 days'::interval
Having to calculate the value in a where clause is inefficient.
Making a calculated column adding created and period then indexing it is more efficient.
CREATE INDEX myindex ON mytable (myfunc(mytable));
Another workaround is to create a proper column and then ON INSERT OR UPDATE triggers that force NEW.that_column to have a computed value. This approach gets you persistent computed columns that you can index on.
What's surprising is that the new functionality in PG 12 doesn't support non-persistent computed columns, and that the other limitations (which are good) don't quite jive with the second workaround I mentioned.
But after contemplating it, is this really necessary? I fear for putting business logic and meanings into the wrong the layer; There are use and abuse, and my consideration fears the latter.
Just because most developers use their DB as a dumb store doesn't mean it needs to be. There are also plenty of successful software systems that place the majority of their business logic and use a generic programming language and runtime only for the presentation layer.
If you're comfortable fully exploiting the capabilities of your DB, then the intelligent combination of a relational model with custom data types, constraints, triggers, views and stored procedures can make the DB the perfect place to implement business logic.
Until db vendors start taking developer happiness seriously, stored procedures and triggers are a total non starter for any serious work.
I mean, in most databases you can't even rename a column without causing an enormous blocking migration. What? Why can't this happen in the background? Why can't I have column aliases? The very basics of developer happiness aren't covered, let alone the harder bits, like versioning stored procedures, switching schema and code when switching to a new git branch, and so on.
(EDIT: of course there are open source tools that help with all of the above, but they're all swimming upstream, fighting a database that simply can't imagine change, and are usually terribly leaky abstractions as a result)
Can you be more specific? What exactly are you missing?
As long as you put your code and data into separate schemes and follow good technical practices, it shouldn't be too different from other technologies.
> I mean, in most databases you can't even rename a column without causing an enormous blocking migration.
Changing a column name is just a metadata change, so it shouldn't take too long in Postgres.
> The very basics of developer happiness aren't covered, let alone the harder bits, like versioning stored procedures, switching schema and code when switching to a new git branch, and so on.
Does Tomcat or any other application server version your WAR files for you or does it manage git for you?
I’d expect most DBs to be like this: https://dba.stackexchange.com/questions/189794/performance-i...
A rename should be just a metadata change.
The trick is to keep declarative business logic in the database layer, and imperative business logic in the application layer.
This allows a large team of developers to move quickly without breaking things.
Can you give some examples of what would constitute declarative business logic vs imperative business logic?
This new feature in PostgreSQL is a great example of that: generated columns (declarative logic) were introduced to reduce the need for triggers (imperative logic).
In SQL, declarative logic consists of constraints, indexes, views, and prepared statements. They can significantly increase the efficiency and reliability of the entire system. Imperative logic is mostly triggers and stored procedures, both of which can become hard to maintain and scale.
This way I can select the top N trades for a given key without having to do the computation in the application, or storing redundant information in the DB.
Changing database schema is a big deal. It might take a lot of time or it must be done with great caution to keep database online. It's hard to properly version it and it's often hard or just impossible to roll back bad update.
Generally database is state and application is stateless. You can couple it, but decoupling works better.
* Change your application logic very easily.
* Use a transaction to deploy new code! Zero downtime! 
* Use read replicas to balance load in many cases.
* Rollback bad application updates
* Test anything 
* Use miriads of development tools from Git to CI systems.
* Use row level security, so that every user can only see his own data 
* Only allow applications to call your code schema, never let them touch your data directly.
Looking at the Postgres driver it seems easier but is there any good tutorial on how to do error handling properly for databases?
Putting complex business logic in the database is opening up all sorts of interesting new ways for data to be unavailable, corrupted or complicated to access. It is easy to imagine it working out well for simple requirements where there just needs to be something that works.
PostgreSQL is a piece of software that takes data and enforces the relational data model on it. Great idea. But the relational model of data is really only tuned to relational algebra. Put complex logic in there and all that is really being accomplished is now you can't migrate away from PostgreSQL. Relational databases already have great integration with every other programming language in current use.
That's what schemas are for. You have a schema for your code and a schema for your data. You can redeploy the code scheme independently of the data scheme and set up permissions so that higher layers can only use objects from the code scheme and never touch the data.
> Put complex logic in there and all that is really being accomplished is now you can't migrate away from PostgreSQL.
Say you wrote your code in PHP, now you want to migrate to Ruby or NodeJS. You can't, you have to rewrite everything. How often do you plan to migrate to another database? In my experience this almost never happens in reality, but the layers above come and go.
That all just sounds a touch complicated compared to data in database, code somewhere else (like git). I'm circling back to the same point in a couple of different ways, but pgSQL specialises in the relational model of data. That isn't a great data model for code and there are already better ways to manage code than shoehorning it into the database. Its cool that it is possible, and I'm not saying that someone who does is making a mistake. But I also don't think they are gaining an advantage and there is a really easy opportunity to separate out where bugs can occur from where data lives.
> How often do you plan to migrate to another database? In my experience this almost never happens in reality, but the layers above come and go.
If you are using a database for its advanced general purpose programming capabilities? The chances probably start to get more likely.
Databases that are a store of data don't need to change because they already do their one job (disk <-> relational model translation) really well. If they are pressured to do 2 or 3 tasks like business logic then suddenly it is a lot more likely that there will be pressure to swap databases.
If I were using SQLite and someone wants to do fancy triggers then maybe I need to swap to PostgreSQL. Avoiding that sort of decision making is a great reason to seal the data completely away from the code.
You use Postgres as a deployment target and not as a replacement for git. It's not complicated at all. You even get features like transactional deployments and the ability to prohibit applications from directly touching the data.
> pgSQL specialises in the relational model of data
The relational model is SQL:92, Postgres does much more than that. Postgres has JSON support, recursive CTEs, Row Level Security, and Window functions that would require dozens of lines of procedural code to do what can be done with a single OVER in its SELECT clause.
> If I were using SQLite and someone wants to do fancy triggers then maybe I need to swap to PostgreSQL.
If you want to put your business logic into an RDBMS, you wouldn't be using an embedded DB anyway, but rather Oracle, Postgres, SQL Server or DB2, which are designed for this type of architecture.
Could you share a few cases where it is better to use a custom type as opposed to a relation? The only things I can think of are generic things like uuid. Would there be a need to create an Employee type vs an Employee relation?
Also, how is the experience with using Python for stored procedures? One reason they are not used is that the language pl/sql is non-familiar to most. If anyone has any experience with that, could you share some of your thoughts?
No, you could always do the materialized equivalent via triggers, so it's not necessary for correctness. And since this feature is limited to the materialized form, too, it doesn't offer much change other than simpler expression of what is going on (which, to be fair, is a big win.)
> I fear for putting business logic and meanings into the wrong the layer
If you don't have some mechanism for calculated columns, you are forced to put domain logic that belongs in the DB in the app layer, which is especially problematic in the (increasingly out of fashion, apparently) circumstance where there are multiple consumers of a DB, as it promotes inconsistency.
If it's something that's not application-specific and fits nicely into the query language (like in the example given) it makes sense to have it inside the DB, because for queries involving sorting, limiting, etc. you can do all that stuff inside the DB and return the few values that interest you, instead of transferring all the data and having to write code to do that inside your app.
What is missing is a modern tool chain that addresses the painpoints of such a development model.
Say you suddenly don't have data for a column anymore, because the API that you read from has stopped providing it. Now you can add a computed column that uses a heuristic as an interim until you have time to remove all the reads from that column.
Or say that you have an object with a separate display name, and product management decides that the display name now can't be chosen freely anymore, but must be generated from name and title. A computed column can do that trick pretty well.
I'm often creating wide, flat views with extended attributes. Many columns are nearly identical, for example, year as an integer, year a date type, etc. This is trivial with views, but ...
For fast query performance, I usually materialize those views. But that's a multi-step and often manual process, requiring schedules or triggers. This feature combines both of those worlds, with fewer total database objects. Especially with ALTER TABLE.
One example is where we need to store a special identifier code. The identifier code is required by law to have a certain format, and part of it is the date the code was generated.
Users frequently want to view items with codes generated on a specific day. For making presentation and reporting easier, as well as significantly faster, we added a computed date column that extracts the date from the code and indexed it.
Keeping an extra field in sync in code would inevitably have lead to bugs, especially as we have several different codebases which could update the id code.
Works great if you’re sure you won’t ever need to scale huge horizontally. One of my RoR apps connects to a schema of views, with instead of triggers calling procedures for updates. Something like 30 normalized tables are rolled up into 5 denormalized views, leaving hardly any ActiveRecord woes and still excellent performance.
> There are two kinds of generated columns: stored and virtual. A stored generated column is computed when it is written (inserted or updated) and occupies storage as if it were a normal column. A virtual generated column occupies no storage and is computed when it is read. Thus, a virtual generated column is similar to a view and a stored generated column is similar to a materialized view (except that it is always updated automatically).
Unfortunately some tooling is choking on this, for instance datagrip crashes om introspection, but hey that's what we can expect for being early adopters. :-)
IMO these types of calcs are better done in your app, where you can at least write a test and assert it's doing the right thing. It also makes it a lot easier to reason about your code if the logic is in the app rather than bits of it stuck in column definitions.
Perhaps there are legitimate uses of this, maybe for DBs that aren't just repositories for apps?
1) the assertion that code in the DB can’t be tested is a bizarre and unfounded one
2) in any serious organisation there may be dozens of apps in a dozen different languages talking to the DB. Do you seriously propose implementing the same thing in each one, or doing it once in the DB and knowing it’s correct for everyone?
Why is a trigger any more surprising than any callback style interface? Or using inotify (Linux) or reparse points (Windows)? Triggers are very easily discoverable, they are attached along with their source code to the table!
A view is just a named select statement, that’s all it is.
A procedure call is explicit, a trigger is implicit. You don't call a trigger, it just happens as a side effect of something else. People tend to forget implicit things. Suddenly you notice that something is acting strangely or slowly in your application. You can look at your functions and procedures and try to find the problem. But if your application is full of triggers, how do you know what is going on? A trigger can change a dozen rows, which in turn can change other rows, so changing a single row can trigger thousands or millions of triggers. Also, triggers are not fired in a particular order, the database is free to change the query plan according to what it thinks is best at the moment, so triggers are not deterministic. Triggers can sometimes work and sometimes not.
In summary, triggers are implicit, have side effects and are not deterministic. They are confusing and surprising. Almost everything that can be done with a trigger can be done with a procedure, but explicitly, deterministically and in most cases even without side effects.
You want centroids and bounding boxes precalced for all the geom you are pushing?
Now you can. Otherwise these can be pretty expensive operations.
I can write a test for the db logic, too.
> It also makes it a lot easier to reason about your code if the logic is in the app rather than bits of it stuck in column definitions.
Its a lot easier to be confident that all consumers of the DB, regardless of whether they are coming through a particular app, have the correct view of the data if the non-app-specific domain logic is in the database.
I agree but I would add that having testing niceties like branch coverage isn't really possible for SQL queries / PGPLSQL functions.
To test your queries all you need are unit tests of the standard form: for given inputs, assert(output).
I’m not aware of any coverage tools, though it’s been a while since I’ve looked.
In general though, materialized views and computed columns are much more useful for reporting than replacing application logic.
They also can be useful to smooth out upgrading line-of-buisness software, where say the old system needs several separate values, but the new system only needs 1, and derives the others. You would use something like this to make the 2 systems play nice until the migration is complete.
When we ELT to the data warehouse, analysts and internal users want to report on reason about these calculations. But then we face the quandary — do we re-implement that business logic in the ELT? Or do we go back and make sure we persist everything, even these values that are so quick and light to calculate and build in the application later? Or do we (shudder) make the ELT load from a bulk API, rather than just copying across the DB?
I could imagine someone talking themselves into using these calculated columns as a solution. I wouldn’t recommend it, but I can see the ELT problem prompting it.
SQL is code. PL/pgSQL is code. Code can be tested. Code should be tested. You don't even need to leave the database to write tests for SQL or PL/pgSQL.
EDIT: sorry, I read your comment on the opposite way. I see we are saying the exact same thing :)
This seems a lot more robust to me.
But there are also plenty times where I'd rather compute at the database side; there are no silver bullets.
Using a computed column ensures the data is consistent.