It would be useful if the data was loaded into different RDBMS' and their execution plans compared. My understanding is that MySQL has the worst query planner and Oracle the best. MSSQL sits somewhere in between, near PgSQL but better in a few areas.
I think MySQL basically gives up optimizing the moment you ask it to do a correlated subquery.
I use MSSQL at work and given it some really dumb schemas and really dumb queries and it still manages to pick good plans. It wouldn't surprise me if MSSQL generated the "optimal" plan from the first query.
It doesn't choose good plans if you break 3NF by a long shot or avoid the most obvious indexes. Otherwise the only thing to watch out for is stale stats and parameter sniffing.
Oracle 12c has adaptive plans which would be a great addition to MSSQL in the fight against parameter sniffing. Basically it figures out that it's probably got the wrong plan ("this nested loop join stopped making sense about 10,000 rows ago") and adjusts it accordingly.
Again, wouldn't be surprised if Oracle got it "right" from the first query.
For all its faults, this is one thing I like about Oracle. No matter how many levels of nested correlated subqueries or other convoluted junk I throw at it, it (almost) always untangles the query into a nicely optimized plan. Of course, this makes it less likely I'll spend time actually thinking about how the query could be cleaned up and streamlined. Which means if I tried to move over to MSSQL or MySQL my queries would probably take forever or fail completely. Ergo, I'll stick to Oracle... ?
I agree with everything you've said, however I want to add that the general technique of compressing your data through pre-aggregation is a sound one, whether it is performed auto-magically by the optimizer, or manually through whatever hooks your DMBS provides.
At least with Postgres (and MySQL to some degree, but Postgres is probably a better launching point), you could hack the query optimizer to handle situations like this if you really wanted to. I like to think this is how Postgres would like you to approach the problem, instead of creating a less comprehensible query.
My biggest problem with these benchmarks and your queries I cover in this [1] post.
If databases.name is unique then you'd have a unique index/constraint on it. Doing this in the environment I set up causes SQL Server to generate identical execution plans for all three queries.
If databases.name is not unique then your query's group by name doesn't make sense. I'll just quote what I wrote verbatim:
> No online retailer is looking at their sales reports and thinking that John Smith is their best customer and Amorette McFredson is their worst.
> ... I don’t know whether the query that was chosen for optimization because it’s good for demonstrating the technique on PgSQL or because it represents a query that is actually in use. I don’t like group by name without knowing whether dashboard names are unique.
> With that said, if it’s not unique then the query is arguably “wrong” because it aggregates results for distinct dashboard names rather than distinct dashboards.
> Removing the unique index on dashboards.name does change the execution plan of the first query – it’s slightly worse than the other two (which remain identical). “Fixing” the first query to group by dashboards.id and dashboards.name causes the better plan to be generated again.
Historically the MySQL query planner has been pretty bad, although it's improved significantly in the past couple years. It was particularly bad dealing with subqueries as you mentioned, although several of the performance issues have been solved in MySQL 5.6.
Yup, this. Of course, it hurts that there are probably less than 10 people in the world who fully understand the optimizer (and the majority are employed by Monty); though I'm sure the MySQL folks at Oracle are getting there.
I think MySQL basically gives up optimizing the moment you ask it to do a correlated subquery.
I use MSSQL at work and given it some really dumb schemas and really dumb queries and it still manages to pick good plans. It wouldn't surprise me if MSSQL generated the "optimal" plan from the first query.
It doesn't choose good plans if you break 3NF by a long shot or avoid the most obvious indexes. Otherwise the only thing to watch out for is stale stats and parameter sniffing.
Oracle 12c has adaptive plans which would be a great addition to MSSQL in the fight against parameter sniffing. Basically it figures out that it's probably got the wrong plan ("this nested loop join stopped making sense about 10,000 rows ago") and adjusts it accordingly.
Again, wouldn't be surprised if Oracle got it "right" from the first query.