Hacker News new | comments | ask | show | jobs | submit login
Selecting All Columns Except One in PostgreSQL (jooq.org)
64 points by kawera 9 months ago | hide | past | web | favorite | 48 comments



"do you really need a billiards table?"

http://pbfcomics.com/comics/billiards-in-heaven/

I mean I guess it's a cute bit of syntactic sugar but are you really going to die if you have to manually specify the columns you want? That's a vastly safer/less brittle practice.


> I mean I guess it's a cute bit of syntactic sugar but are you really going to die if you have to manually specify the columns you want? That's a vastly safer/less brittle practice.

No, it's a basic piece of functionality that should have existed a long time ago.

I'm reminded of a quote, I think from pg that goes like "programming languages teach you to not want what they can't give you."

It's absolutely true. I see few language features met with comments like yours and I just wonder, why? Why can a feature only be added if the alternative is that we're "going to die?"

As somebody who does SQL for a majority of their work, this feature is so overdue. I briefly used RethinkDB from 2013-2014 and RQL supported this functionality. What a treat it would be to be able to do this on Postgres.


I'm in the opposite camp. Even small features are very expensive. You need to support them for the life of the product. You need to think about them every time a new feature, bug fix, anything comes along. I prefer products that have smaller feature sets and those features are rock solid and very well thought out than just throwing every possibility in because you can.

If skipping EXCEPT means say the postgres team can focus that much more energy on bugs and stability, then that's a good trade off in my eyes.


Except that's not how real software is built. Real software has to add new features to remain competitive. Just look at a few of the features for Postgres 9.4 to 10:

JSONB data type, ALTER SYSTEM statement for changing config values, ability to refresh materialized views without blocking reads, dynamic registration/start/stop of background worker processes, Logical Decoding API, GiN index improvements, Linux huge page support, database cache reloading via pg_prewarm UPSERT, row level security, TABLESAMPLE, CUBE/ROLLUP, GROUPING SETS, and new BRIN index[113] Parallel query support, PostgreSQL foreign data wrapper (FDW) improvements with sort/join pushdown, multiple synchronous standbys, faster vacuuming of large table Logical replication, declarative table partitioning, improved query parallelism

We'd all love for software to be bug-free and stable, but features have to be added.


For sure, that is the reality we live in. Doesn’t mean it’s a good thing though. There’s even been some push back to this trend, such as Apple reconsidering how they do release cycles due to iOS becoming so buggy.

It also depends on what kind of software we are talking about. A trendy mobile app is very different from a critical database.

I personally prefer software that’s more conservative in its approach to new features. It’s becoming harder to find software that has this philosophy but not yet impossible.


Nobody is arguing that products shouldn't have features. But there is a real cost to each feature that you add, and so you have to weigh the tradeoffs before blindly building everything anyone asks for. Selecting all columns but one seems like it would bring much less value than JSONB.


The parent comment said he preferred software with less features, and favored bug fixing and stability over feature X. Feature X could be swapped with any request in Postgres's backlog.

I agree with you: all features have to have a cost and value. The market decides what features are valuable, and they get built and supported. SELECT EXCEPT is not one that the market values yet. And that's fine.


> The market decides what features are valuable, and they get built and supported.

I'm curious as to what you envision as "the market" in the context of free software, or Postgres in particular, since I don't expect it's the cliche of people literally deciding which product to pay for among competing ones.

A couple possibilities came to mind, but, having no intimate familiarity with any major project like this, I didn't want to parade mere assumptions as multiple choice.


What happens if you ALTER TABLE ADD COLUMNs? The fact that you can add Oracle triggers that fail after the fact is one of the things I hate most.

Adding columns can break EXCEPT queries. This is one of those situations where adding a feature can break a lot of other stuff after the fact.


That house left the barn with SELECT *, which is sensitive to the addition of new columns.


The feature would be useful when you want all columns except one or two specific columns.

If new columns are created they should also be in the result set.

If that breaks your code except was not the right call because you specifically needed a subset of columns, not the inverse.

Your laziness and abuse of the inverse is the problem, not the feature it self.


Thank you.....this mentality of don't add any new feature that might cause a problem, on a hacking website of all places, is bizarre.


Actually what I would really find useful is the ability to select all columns except 2. In rails I sometimes dont need to see created_at and updated_at


Right, but why do you want to force the query planner to have to do an enumeration? And what happens if a column is added to the table? What if you use positional indexing instead of column names in your application (it performs better in many systems) and the column order is changed? Does your RDBMS or the SQL standard even guarantee the order of columns returned by a *? The fact that you don't know for sure should be reason enough to give you pause.

Syntax like this is fine for ad hoc queries, but not in a query that's part of an application. You're writing those queries once to execute hundreds, thousands, millions of times. Making your query concrete is probably wise.


> but why do you want to force the query planner to have to do an enumeration?

Same as anything else in software: I considered the pros and cons, and made a choice to go a particular route.

> And what happens if a column is added to the table?

It will show up in the query, as noted in the documentation.

> What if you use positional indexing instead of column names in your application (it performs better in many systems) and the column order is changed?

Then you will have a bug. Don't use both at the same time.

> Does your RDBMS or the SQL standard even guarantee the order of columns returned by a ?

Read the documentation. If order is not guaranteed, do not assume it will always be the same.

> Syntax like this is fine for ad hoc queries, but not in a query that's part of an application.

It's my money, I'll* decide what belongs in my application and what doesn't.

> You're writing those queries once to execute hundreds, thousands, millions of times.

You are speculating.

All of the above arguments could be made about 50% of functions existing in software, yet life goes on. The world isn't that fragile.


    CREATE TABLE safe_users AS
    SELECT *
           EXCEPT users.pii
      FROM users
    ;
Lots and lots of uses for this feature. I often wish I had it in Redshift.


I suppose it could be useful for keeping your "breach notification team" busy, after somebody (for whatever reason) adds a "users.hiv_positive" (boolean) column or some such and that column ends up being included somewhere you didn't intend it to be.

"Root cause analysis: The {database administrators|programmers} were too lazy to explicitly define the column names they wanted, resulting in the unintended disclosure of HIV positive status of $bignum individuals."


If you had a data structure with your fields, you could do that much more explicitly.

Not a Ruby guy but the Ruby equivalent of something like:

  rejected  [:updated_at, :inserted_at]
  to_select = 
    Enum.reject(fields, fn field -> 
      field in rejected
    end)


In SQLAlchemy it could look something like this

  cols = list(table.columns)
  cols.remove(table.columns['updated_at'])
  cols.remove(table.columns['inserted_at'])
  query = select(cols).select_from(table)
(You can abbreviate table.columns with table.c)

With sqlalchemy.orm you could also define a Bundle, see https://docs.sqlalchemy.org/en/latest/orm/query.html?highlig...


Much simpler usint the ORM - you can defer columns:

http://docs.sqlalchemy.org/en/latest/orm/loading_columns.htm...

This is both at the 'object' level, and at the query level for example at the query level: from sqlalchemy.orm import defer, undefer

query = session.query(Book) query = query.options(defer('summary')) query = query.options(undefer('excerpt')) query.all()

You can also group columns for deferral/undeferral.

I use this loads.


Interesting, I knew about defer and used it in the past for pretty much the purpose the first example illustrates, but didn't remember/realize it could be used on a per-query basis.

I think if you strip columns for performance reasons for one specific query it would still be better to use Bundles or the equivalent loose select (i.e. session.query(Book.id, Book.title)) instead of query-local-deferred columns, because then you will notice if a code change leads to a down-stream access of an unwanted column for some reason.

­-- as usual sqla has at least one clean solution handy for almost any problem.


what sql does this produce though? I can imagine that there'd be a hit on performance


Why would you imagine that? This compiles to SELECT col1,...,coln FROM table ...;


Probably something like:

  to_select = fields - [:created_at, :updated_at]
Though there’s always a slight uncomfort in not being 100% sure the array elements aren’t strings, so might throw in a cast here just to be sure.


You may not have all the column names for example when you are working with a crosstab query


I suppose this is nice for adhoc queries in a repl, but I hope no one writes it in an application. Heck, I don’t even use “select *” in code


Increasingly, there are companies where more SQL is written by analysts sifting through data than by engineers in application code querying databases. And by a huge factor.

This sort of functionality shouldn't be used in application code (ironically, exactly as it's being shown in the example) but is a real treat for analysts and engineers writing adhoc queries over large tables.


I see this sentiment around a lot and can you explain why? I hope it’s more than just an optimization thing. Getting rid of an extra column or two strikes me as premature optimization in scenarios that are not “web scale”


The result format of select * is unpredictable unless you are guaranteed to know your current schema. And if you know your current schema and want all columns, for an ORM it's trivial to generate a list of all the columns you want, so there's no need to use star.

For example: Let's say you have a piece of code that does: id, value = query("select * from tbl").fetch_one(). Now that works great if you know you have exactly the columns id and value in there. But let's say they swap places (possible in mysql), or a third column is added, and you don't update your code alongside it. Now that code breaks. Had you done "select id, value from tbl", it wouldn't break.

PS: The sibling answers are also all correct; there's more reasons not to use select * outside of the shell.


Select * is useful if you are adding columns. You can either read the column or supply a default if the column doesn't exist. Doing this by having two queries naming different lists of columns is much worse, although it's unavoidable for mutations AFAIK.


Gotcha yes for sure in this example you would want to use the column names. In my database code I extract values by referencing the column names - not by their position. So neither swapping or adding a new column breaks the code.

Well thank you for explaining your reasons.


Do you mean your database driver doesn’t supply you with field names, that you rely on order to map up data?


I guess I often rely on column order in Go, given Rows.Scan() [0]... so `select ` would be scary since someone may ALTER the table.

Maybe that's the antipattern! But it's also a feature so I feel like you're better off avoiding the . Also, if someone adds additional big columns, you may incur additional, unnecessary cost

[0]: https://golang.org/pkg/database/sql/#Rows.Scan


Your driver can probably introspect, yes. You can also probably introspect yourself if you want to. But all this to be less explicit about what you're querying seems like a waste, doesn't it?


> for an ORM it's trivial to generate a list of all the columns you want, so there's no need to use star.

But doctor, it hurts when I [don't use an ORM/ORM-generated queries]


If you're not using an ORM you especially shouldn't be using the star-op in code.

Don't litter in the streets, don't litter in your code. If you make an unmaintainable mess and it hurts, you are entirely responsible for owning it.


When writing a query with multiple joins and functions, the query plan will build a pipeline that pulls the data from storage if it does not already exist in the cache.

Then it has to process the data based on the declarative statement that was sent to the query engine. This means it may need temporary spools in memory or on disk to handle sorting, hash joins, etc.

Sometimes the engine is able to optimize this pipeline and only pull in the columns used in the outermost part of the query.

Being explicit in your column selection can greatly reduce the amount of memory and CPU needed to generate the results.

Probably the worst impact this can have is when an application is using a query with SELECT * on the data side when only a handful are used in the application.

Outside of disk IO, memory usage and CPU usage, now you have incurred an overhead by sending data across the network that is never being used.


Writing "SELECT *" makes it less clear what columns are selected exactly. Explicitly writing what columns shall be returned is more explicit and clear.


That's helpful from a code review perspective but it's the performance hit you get with a * select that is the bigger reason to avoid it.


The column name conflicts can cause no end of trouble later on as well.


* EXCEPT is valuable for VIEWs, e.g. VIEW of JOIN with multiple tables containing the same column. If those tables get new columns added, you want the view to automatically include those new columns.

Yes, obviously those new columns can have name conflicts and break the view just the same -- but I'd much rather have compile-time errors than the scaling nightmare of having to remember to explicitly name dozens of columns each time, to make sure the new columns appear in the view.

Obviously, ORMs address these sorts of problems at the cost of portability between programming languages and third party tools (that don't use your ORM). Yuck.


<rant type="off-topic">

In case the author happens to be reading the comments here:

Whatever you have done that messes with the scrolling and the ability to open a link in a new tab is not an "enhancement" at all (on Safari on iPad, at least) and caused me to "bounce" from your web site much quicker than I would have otherwise.

I wouldn't normally complain about this here on HN (apologies to my fellow HN'ers) and would instead leave a comment directly on the site, but I wasn't even able to do that because of the UI hijacking that is taking place.

Great job making the web even more unusable!

</rant>


SELECT * EXCEPT syntax would be killer when you get into dynamic SQL and especially when need to pivot on shit you don’t know about in advance.


Yes it should be a feature in all databases


The tasks of dropping columns and also of replacing one column value while brining over all others are very desirable data transformations. Codd's algebra had related features (project/extend and maybe an assign) and they showed up in Tutorial D's EXTEND keyword. It is definitely something I think is important enough to add to the rquery R SQL query generator I am working on https://winvector.github.io/rquery/ ( drop_columns(), and extend_*se()'s overwriting behavior). And the R community is already familiar with such piped relation style notation in the famous dplyr package.


is EXCEPT part of the SQL standard?


Yes, but not for this usage. It’s a set combiner like UNION and INTERSECT. IIRC, Tutorial D has an operator like this use of EXCEPT for attributes called ALL BUT.


Not in a column specification of a SELECT query.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: