Hacker News new | comments | show | ask | jobs | submit login
Query optimization in MySQL with Subqueries (gergerconsulting.blogspot.com)
26 points by serdarbicer 2443 days ago | hide | past | web | 14 comments | favorite

There's no question that correlated subqueries are piss-poor in MySQL and instead creating derived tables is quicker.

And in most cases it'll be quick enough.

But I have exeprience recently on much larger datasets -- always over a million and often upwards of 10 Million -- and the truth is, 2 queries is better.

I didn't read the SQL closely enough but most cases of subquery (whether in the select, from or where clauses) is to do in 1 query what really takes 2. So assuming that applies here, yeah, if you've got large datasets, doing several simple queries is better than fewer complex queries.

The MySQL query optimizer is just not there. It's not game-day ready the way Oracle and MSSQL is.

Isn't the first query generating a cartesian product ?

select e., u., l.* from entry e, user u, location l where e.location in (select l2.from location l2 where (l2.lat between x1 and x2) and (l2.lng between y1 and y2)) and e.user = u.id order by e.id desc limit 0,10

Where's the join condition for the Location table in the outer query ? Furthermore, what the heck does this mean:

select l2.from location

It looks like it's missing a column name, or a * wildcard.

I was pretty surprised by the results, though. Even our lowly product ElevateDB can handle optimizing correlated sub-queries like this pretty easily, and effectively treating them just like an inner join. I suspect that the query was specified incorrectly, although I guess it's certainly possible that MySQL is very bad at optimizing correlated sub-queries.

Getting rid of the 'IN' clause.

exactly. Using an IN statement with a subquery is textbook "how to make a query slow" - it forces a sequential scan. No-one who understands SQL would write a query like that.

It's certainly a cardinal sin on MySQL but commercial platforms have optimizations that make these correlated subqueries performant.

It's not inherent in SQL, it's down to the optimizer

MySQL supports spatial indexes exactly for circumstances like this:


I believe the first query was so slow because when using WHERE / IN - mysql will execute the subquery for each and every row, not do the subquery first as some might expect.

I'm reading the first query and maybe it's just early but why isn't that subquery just a join? It seems to be acting that way.

By definition all queries using JOINs can be re-written as sub-queries and vice versa. However, in some cases one is more intuitive, convenient or faster than the other. With MySQL you are better off with JOINs in most cases, and use subqueries only when there is a clear benefit and you understand why that would be faster. One exception to this is using a subquery with EXISTS, which is often way faster. While MySQL will tell you that it will look at all matching rows (when looking at EXPLAIN), it will only look at the first one it finds, so practically it ends up being faster.

It basically is; the technique used here is sometimes called a delayed join, which is also extremely effective on huge sorted queries.

Good point! Actually, in early stages of our development, it was accomplished using two different queries and we needed to combine them to a single query. It was just a scratch and maybe it caused a bias which prevented me to see shorter way.

Articles like this make me cry. Use a real database for god's sake (hint: it's free and the logo is an elephant).

Spoon-feeding trivial queries to MySQL is just wrong on so many levels.

it's not an elephant, it's a dolphin...

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact