

Everybody thinks SQL joins are slow-is it because MySQL doesn't have hash joins? - fatalmind
http://use-the-index-luke.com/sql/join/hash-join-partial-objects

======
jbellis
No, the author has misunderstood "joins are slow." (Edit: not the post author,
but the HN submitter. This reaction made more sense before the submission
title was changed.)

The point is that joins are slow _when you scale past one machine_ because the
data you are joining will be on different nodes.

(Yes, I'm familiar with pk-based sharding or entity groups, or even more
sophisticated partitioning like volt's. None of these can always always be
made to fit your data; I'm talking about the general case here.)

~~~
tedjdziuba
I think the author's point is that MySQL JOINs are slow if you are joining
more than 2 tables.

More mature databases like Oracle or PostgreSQL can eat 4 or 5 way JOINs for
breakfast.

~~~
PaulHoule
I definitely don't have trouble finding queries that are 50x slower on PgSQL
than MySQL so I don't think it's as simple as saying one is more mature than
the other in all ways.

That said, if you're doing N-way joins with N big you probably want bitmap
index support too, which is another thing missing in MySQL, although I've
written libraries that, for certain specialized cases, can fake it.

~~~
gtuhl
Postgres has bitmap index support baked in as well and it works really well.
I'd be awfully interested in knowing where MySQL beats it by 50x at anything.

~~~
spudlyo
Some MySQL vs PG tests done by Domas (of Facebook & Wikipedia fame) show some
interesting differences between MySQL and PG for an in-memory read only test.
I don't think there are any 50x differences though.

<http://dom.as/2010/11/08/random-poking/>

~~~
gtuhl
The tough thing about comparing these two is that it is hard to find a
benchmark (I don't know of one) that works similarly with them both. Sysbench
is somewhat of a standard for testing MySQL configs but Postgres does not
handle it well.

Though I tend to prefer Postgres don't get me wrong - MySQL has some
advantages.

For raw pkey lookups, especially range selects against pkeys with lower
amounts of concurrency InnoDB's speed is unmatched. It should be given that it
is completely laid out on disk specifically for that to the detriment of other
features.

MySQL's replication is extremely sturdy. On paper the older method (statement-
based) sounds incredibly fragile but in practice it works extremely well and
has been proven on countless projects. mmm makes it almost too easy to setup
replication and failover.

I feel that MySQL stalled out for several years in the 5.0-5.1 period where
poorly engineered features were bolted on to create a product with too many
pathological cases that destroyed performance to keep track of. All these
features were available but experience taught you to avoid most joins, avoid
most subselects, avoid most usages of views, etc.

That said, v5.5 has a lot of nice improvements and can actually scale up to
more than 8 cores (I've gotten linear improvement up to about 32 cores and
that is what Oracle puts in their white papers as well). Percona and Facebook
are releasing nice patches and branches, and forks like Drizzle are reaching
GA and doing good things as well. So I think it is headed in the right
direction again.

------
russell
I recently had a similar problem. Users wanted to export a highly normalized
structure to a csv file for munging in a spreadsheet. They were limited to 200
rows, because a thousand rows would bring everything to its knees. My first
cut was to move from Hibernate to pure SQL, but if I tried 1000 rows it went
away for a long time. I dont know how long because I killed it after 15
minutes. The real killers seemed to be lef outer joins.

I then split the query into several temporary tables. The first table
contained all the rows that I was interested in. The other tables replaced the
left outer joins and nasty beasts like group concatenates and queries that
turned aggregate results into separate fields. The temp tables were used to
update the first table. Runtime went from a significant part of an hour to
under 5 seconds.

------
hoop
Based on the majority (since the title is going for "everybody") of people who
drop into #mysql on any given day to ask about slow queries, I'd argue that
the majority of people who thinks MySQL joins are slow still don't have proper
indexes and are still using their un-tuned, distro-provided my.cnf

------
AlisdairO
It's worth mentioning that MySQL (iirc) _does_ support sort-merge joins, which
aren't mentioned by this article. So it's not a case of (index) nested loops
joins or nothing. While hash joins are usually significantly faster than sort-
merge, they have a variety of similar characteristics, and don't suffer from
the mash-the-b-tree issue that index nested loops do. Overall this makes me
doubt that the lack of hash joins is the fundamental reason some people think
joins are slow.

~~~
nerfhammer
MySQL only supports nested inner loop joins.

~~~
AlisdairO
...ah, I just bothered to check it out in more detail, and you're kind-of
right. MyISAM only supports nested loops (which I have to say surprised me),
but InnoDB (which, let's face it, is the backing store that allows mysql to
qualify as a 'real' RDBMS) supports hash joins.

------
mrspandex
Not directly related, but this is a great site to peruse. Great code examples
and a great perspective on database performance.

------
chopsueyar
Is the answer in the domain name?

