Hacker News new | past | comments | ask | show | jobs | submit login

Thanks for the extra testing.

My biggest problem with these benchmarks and your queries I cover in this [1] post.

If databases.name is unique then you'd have a unique index/constraint on it. Doing this in the environment I set up causes SQL Server to generate identical execution plans for all three queries.

If databases.name is not unique then your query's group by name doesn't make sense. I'll just quote what I wrote verbatim:

> No online retailer is looking at their sales reports and thinking that John Smith is their best customer and Amorette McFredson is their worst.

> ... I don’t know whether the query that was chosen for optimization because it’s good for demonstrating the technique on PgSQL or because it represents a query that is actually in use. I don’t like group by name without knowing whether dashboard names are unique.

> With that said, if it’s not unique then the query is arguably “wrong” because it aggregates results for distinct dashboard names rather than distinct dashboards.

> Removing the unique index on dashboards.name does change the execution plan of the first query – it’s slightly worse than the other two (which remain identical). “Fixing” the first query to group by dashboards.id and dashboards.name causes the better plan to be generated again.

[1] http://ta.speot.is/2014/01/25/use-subqueries-to-count-distin...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: