
Myth: Select * is Bad - fatalmind
http://use-the-index-luke.com/blog/2013-08/its-not-about-the-star-stupid/
======
bcoates
He's missing the other half of why 'select *' is bad, which really is about
the star: Selecting columns by name automatically makes almost all
incompatible schema changes cause the query to fail (in an easily detected
place, for an obvious reason), while still allowing non-breaking schema
changes like adding or dropping columns you don't use to have exactly the same
behavior as before.

~~~
ams6110
'select <star>' also doesn't always do what you expect.

I recall a problem with some Oracle views that were based on 'select <star>'
queries. What we didn't realize is that the '*' is evaluated when the view is
compiled, and if you later add columns to the underlying tables the view did
not automatically include them.

Edit: OK I don't know how to include a literal star in text here.

~~~
TeMPOraL
Try typing 3 stars in a row: __* :).

~~~
ams6110
No luck

------
jacques_chester
The main danger with mixing SELECT * and ORMs when you only need some
particular columns is the performance hit.

Not just on the database itself. You also need to ship surplus data to your
application and then instantiate objects with surplus fields. All of this
takes bandwidth, RAM and time.

I also once saw someone write idiomatic ruby that performed a sum over an
ActiveRecord .each. In dev, nobody noticed. In production it pulled a 9
million row table across, iterated over 9 million ruby objects, all to sum up
a few hundred kb of actual data that the database could do in a few hundred
ms.

The problem is hoping that ORMs will make those icky databases _go away_. They
don't. They can't.

------
codeulike
He says its a myth, then admits it isn't a myth.

In conclusion: Clickbait for DBAs

~~~
corresation
His argument seems to be that it's a "myth" because you could do the same
thing by explicitly listing out the columns. Which in no way makes the former
a myth, and it is just pure clickbait, as you said.

It gets much worse than just single table grab-alls, though. I once worked at
a shop where they insisted upon prolifically using "select just about
everything from every relational table, comprehensively joined to the n-th
level" views as some sort of confused notion about code reuse, such that
everything would then select from these master views. The end result was that
it was incredibly frustrating trying to resolve performance problems, because
literally everything was a performance problem. It was one of those instances
where I shook my fists at every naive developer railing about purported
premature optimizations.

~~~
mercurial
"Don't go out of your way to optimize prematurely" does not mean "architect
your software in the opposite direction to industry common sense". "We will
optimize later if needed" _does not work_ at architecture level. That's why
you prototype before committing yourself.

~~~
corresation
_" Don't go out of your way to optimize prematurely" does not mean_

It doesn't mean anything, because both optimize and premature are in the
opinion of the viewer. There is no consensus on this at all, but it has been
my experience that when those famous words are uttered, badness is about to
occur.

------
mistercow
It seems to me that in discussions of NoSQL vs relational databases, one of
the main arguments that comes up regarding ease of design and maintenance is
"you can just use an ORM". But I see major disadvantage after major
disadvantage of using an ORM, and there seems to be some overlap with the
disadvantages of NoSQL (by which I am primarily referring to document-oriented
databases). I really can't tell, once you factor these disadvantages in, what
comes out on top.

~~~
bcoates
The ORM I really want just takes my SQL and a connection to my database at
compile or load time, and wires it up to the type system of the language I'm
using. All the ones I've tried just seem to embrace the awfulness of the low-
level database access APIs and are just trying to replace the good-enough DSL
that is SQL with something worse.

~~~
tmcneal
The closest I've seen an ORM come to this is Squeryl:
[http://squeryl.org](http://squeryl.org)

~~~
oacgnol
Which has been superseded by Slick
([http://slick.typesafe.com/](http://slick.typesafe.com/)) and does exactly as
the parent says.

------
trollied
Crikey.

SELECT * can be bad for the following reasons:

\- returning more data than you need over a network connection can be a
bottleneck. The ORM/java prog written by a programmer that doesn't care may
only need a few columns. Tables can be very wide, especially when CLOB/BLOBs
are involved

\- Stored procedures, functions and triggers that use SELECT * may break
horribly when columns are added or removed from tables, views or materialised
views

\- Table types (collection and record types) will break if any column
redefinition occurs

\- It shows a lack of understanding of the data by the developer. This may, in
turn, cause further issues

Proper schema design and understanding of the data involved is key when
working with databases. The existence of NoSQL and the general dumbing down of
Computer Science degrees, plus the teaching of ORMs has caused endless
problems. The need for NoSQL in the first place was possibly caused by the use
of high level programming languages where you didn't need to know the data -
you just fetch it all, then piss around with it in ruby/whatever, then fetch
it again & again until you're done.

Rant over ;)

------
einhverfr
I use select * a lot in PostgreSQL. I do this any time I have a relatively
self-contained table which can't be further normalized very far, and CRUD
queries wrapped behind a stored procedure which returns the table type.

The reason I use select * there is that it reduces a maintenance point when it
comes to return types and tables. In that case, I am guaranteed to get a
return type which matches the table type. If the two are closely linked, then
select * makes a lot of sense, and the alternatives aren't going to do better.

This makes schema changes _more compatible_ than trying to change the column
everywhere in every query in addition to the higher levels of code.

I would generally agree that an application calling select * from table is a
bad thing for the same reasons, but if you are encapsulating your db behind an
API, a lot of reasons shift.

~~~
jacques_chester
I have the opposite preference. If there are a big bag of breakages when I
change the schema, at least I see where they are. I'm not a fan of relying
solely on tests when the DB itself can notice introduced discrepancies.

~~~
einhverfr
The fundamental question is where and what you are doing.

As I say, I use select * a lot, but these are in two groups:

1\. In UDF's to ensure the proper return type, and

2\. Against UDF's where it is the UDF designer's job to maintain the software
interface contract.

In those cases, select * works, but it only does so because you have
dependency injection and can decide on a case-by-case basis whether to pass
the changes up, wrap them in a view, or the like. Most of the time this is a
simple choice. The app needs a new column that we are storing and so.... But
in a multi-app environment there may be other choices, and that dependency
injection makes all the difference in the world.

EDIT: for example, if you have a UDF returning the type of a given table, then
you may want to use select * because you have already decided these are to be
tied together. Similarly select * from my_udf() is not a bad thing the way
that select * from my_table is.

~~~
jacques_chester
Ah, well. I guess my point was about selecting from tables, which is still the
common case.

I presume that if you find yourself picking columns out of UDFs, it's a code
smell that the UDF needs rethinking.

------
aidos
There used to be a bug in either MySQL or SQL server (can't recall which
anymore) where if you used select * in a view and then adjusted the schema the
data would be out of line with the columns.

~~~
nathanstitt
SQL Server and the 'feature' was still present in version 2003, not sure about
version after that.

~~~
codeulike
SQL Server behaviour is: If you have a "select *" view and then add more
columns to underlying table, columns will not show up in view until view if
refreshed

Thats not too bad, and it does not get the column names mixed up.

~~~
sciolistse
That's the way it works now. In earlier versions any added columns would still
be retrieved, shifting all the other column values to the right.. So all the
data would wind up having the wrong column names.

~~~
codeulike
I've been using MS Sql from version 7, through 2000, 2005, 2008 etc, and I
never saw that bug.

------
acscott314
Why is this article on HN? Come on. Select * is just a construct. When is it
bad? That would be a better discussion. In any case the article demonstrates a
lack of fundamental understanding of relation database technologies such as a
clustered index...hm wait. I'm taking the bait. Read a good book on relational
databases. SQL for Smarties will get you started.

Is select count(1) from mytable faster than select count(*) from mytable?

------
erehweb
Bad in column-oriented databases like Vertica.

~~~
mrich
It is bad because in a column store each column will be stored in a different
location, resulting in seeks on disk or cache misses in memory. In a row store
the complete row will often be in a contiguous block of memory. It gets worse
when the column store uses dictionary compression and the dictionary is not
already cached, since you need one more memory access to a random location.
For wide tables the overhead is noticeable and the application should really
only retrieve what is necessary.

------
ohwp
SELECT * can be bad when you join other tables. For example:

    
    
      SELECT
          *
      FROM table1
      JOIN table2
          ON table2.field = table1.field
    

In the above example you select everything from both table1 and table2. When
they contain fields with the same name, strange things can happen.

Therefore always use: SELECT table1.* FROM table1

------
groundCode
select * can also be bad if you have any sort of system that expects or uses
your columns by index. someone changes up your database, adds a column or some
such and all of a sudden your consuming code breaks all over the place.

~~~
VLM
You can also experience severe performance degradation (both server and
client) if someone adds an absolutely huge binary column.

Perhaps an example of table checking_account_register hmm lets add a new
feature, after a paper check is cancelled we'll scan it into an image file and
stick it in the database. Suddenly you get giant TIFF for each row returned,
surprise! Of course a better spot Might be a separate scan table linking
checks to an image of the check (perhaps multiple images, multiple scan
attempts, multiple sides of the check, and all that), but for the sake of
argument, etc.

I think probably more databases get killed by processing load via no WHERE or
LIMIT clause than from using a * as a column list... probably. With a close
second of gathering way too much data and weeding it out in a HAVING.

~~~
DrJokepu
Of course, adding large binary blobs to a relational database is almost
certainly a bad idea anyway. It’s just not meant to be. You can achieve
transactionality by other means and it won’t actually make managing and
backing up data easier because suddenly database backups are an awful lot
larger.

------
ck2
If you have text fields in your columns and you don't actually need them,
SELECT * in indeed bad as it will cause temporary tables to be made, even if
indexes exist for the query.

------
ronaldsvilcins
Thanks for sharing your experience with Firefox OS.

~~~
aray
What's the context to this? It doesn't make sense to me in relation to the
article, but maybe I'm missing something.

