

MySQL at Facebook - chuhnk
http://highscalability.com/blog/2010/11/4/facebook-at-13-million-queries-per-second-recommends-minimiz.html

======
ecaron
I doubt this HN post could have a more boring title. The actual article is
very interesting. Basically Facebook's DB developers talk about what it takes
to perform 13 million queries per second, and a lot of the useful tips they've
learned along the way to make scaling simpler.

There's a full video of the talk available at
[http://www.livestream.com/facebookevents/video?clipId=flv_cc...](http://www.livestream.com/facebookevents/video?clipId=flv_cc08bf93-7013-41e3-81c9-bfc906ef8442)

~~~
catshirt
How is the title boring? What could be more telling? Having "Facebook" in the
title makes it clear we're going to be talking about astronomical amounts of
data. "[insert any db] at Facebook" would be an interesting title in my
opinion.

The fact that it's MySQL makes it even more interesting, given the shift of
scalability interest to MongoDB, etc.

~~~
nphase
Maybe boring isn't the right word. Vague is probably better. "MySQL at
Facebook" _is_ a facebook page, a launchpad user, a twitter user, a O'Reilly
MySQLConf talk, etc.

~~~
kamechan
not a conspiracy theorist by any stretch, but i found it a bit humorous this
came out just when oracle offered a pricing plan for mysql and, especially,
innoDB.

------
qjz
"They figure out why the response time for the worst query is bad and then fix
it."

This can reap huge benefits and doesn't need to be difficult. Just enable the
slow query log in MySQL, use the EXPLAIN command to analyze the results, then
add indexes where appropriate. I was able to fix poorly indexed tables in a
vendor's application with dramatic results. In one case, a twenty-minute(!)
query was reduced to less than a second.

~~~
matwood
While your premise is correct it's not always that simple. More indexes can
slow down updates and inserts. A tables overall usage pattern needs to be
examined before just adding more indexes. You might have fixed the 20 minute
report query while slowing down some other more critical query that loaded or
updated data.

~~~
qjz
Yes, this is a good point and I actually noted it in my issue tracker while
troubleshooting. I made sure that inserts & updates weren't negatively
affected to an unacceptable degree after adding the indexes. The overall win
was huge, because it eliminated wait times for a sizable staff and client
base.

------
thevivekpandey
"It is OK if a query is slow as long as it is always slow" I find this
enlightening. I think it is just one instance of difference in emphasis
between Google and Facebook. Google, a technology oriented company, would
minimize the average response time, while Facebook, a people oriented company
would minimize the unpredictability (by minimizing the variance)

~~~
davidamcclain
>> "It is OK if a query is slow as long as it is always slow"

I'm having trouble understanding the motivation. If a slow query is always
slow, then I'm always going to be kept waiting for that page/data. It seems
logical to worry about the queries that 100% of the time keeps users waiting
rather than the queries that keep users waiting <100% of the time.

Does anyone care to explain why this is a good idea (for Facebook at least)?

~~~
bialecki
I'm sure the rationale here is that if a given query takes 100ms, they don't
focus on getting it down to 50ms even if that's several times what the average
is _because they know they can_. Certainly someone should focus on making that
query faster, but it's more straightforward.

The harder problem is figuring out why that 20ms query suddenly balloons to
200ms. You can say, "no big deal, it only happens 1% of the time," but if you
don't know why, you could make changes to the system that cause it to happen
much more frequently and eventually bring the whole system down.

Also, there's a bit of UX here. People are much more frustrated by things they
don't understand and/or aren't used. There are parts of GMail that are always
slow (archiving a lot of messages). I know this so I know I have to wait 5-10
seconds. What if sometimes it took 1 second and sometimes it took 20 seconds?
What if it took 20 seconds 5% of the time. I'd probably always click again and
think something was broken. If it's always slow, I want it to be faster, but
at least I know what to expect.

------
helwr
Reducing variability is one of the main goals of the classic quality
engineering.

See Deming's work on statistical process control used in wartime production
during WWII: <http://en.wikipedia.org/wiki/W._Edwards_Deming>

I'm glad Facebook is following this old school engineering tradition

------
morgo
The concept of variance reduction is not even new with computing - In
manufacturing they have six sigma. I believe the famous GE CEO's quote was
"Our Customers Feel the Variance, Not the Mean" (Google it).

This is talked about here:
[http://www.mysqlperformanceblog.com/2010/06/07/performance-o...](http://www.mysqlperformanceblog.com/2010/06/07/performance-
optimization-and-six-sigma/)

------
codypo
I enjoyed the section on creating quality, not quantity, and its emphasis on
minimizing variance. I can see how these heuristics could be applied to most
startups.

The section on diagnosing should be taken with a grain of salt, though. If
your company ever gets to the point where you need to monitor everything at
subsecond level to catch problems or analyze and understand every layer of
your stack to see how it performs, you've already won. That amount of
attention to scalability means your company has a huge base of users. Not only
that, it means you have the large and impressive engineering resources to
devote to that problem.

That's definitely not my startup, and so the tools described, while definitely
useful (and probably fun to build!), aren't anything approaching a priority
for me. In the words of the stereotypical Yiddish grandmother, you should be
so lucky to have those sorts of problems!

------
MichaelGG
It'd be interesting to know some info on the hardware back end, such as number
of servers, storage system, etc. Also, how many servers does a typical query
touch?

------
preek
Does anyone know the througput of the largest MSSQL installation? I'm
searching the web to show off a little information at work, but I can't find
anything that compares.

------
philwhln
"They measure how things work in operation. They can monitor at subsecond
levels so they catch problems.". Genius!

------
known
Does PayPal use MySQL?

~~~
ora600
Everyone is using MySQL. But I attended a talk by one of their DBAs where he
said that they large OLTP server, the one that processes the payments is
Oracle. Single-instance, because Oracle RAC can't give them the low latency
they needed.

They switch the the biggest machine IBM can give them every few month.

~~~
MichaelGG
>They switch the the biggest machine IBM can give them every few month.

Yea, almost all this "web scale" scalability can achieve those numbers because
they can partition, relax transactional requirements, and scale out. For
system-wide true transactions, scaling up is still the only option, it seems.

Although, H-Store/VoltDB claims some impressive numbers if your OLTP app fits
the requirements.

~~~
mzl
Just curious, do you have any good examples of very large scale systems that
need system-wide transactions? I might miss something obvious (I'm not
actually in the database area), but I couldn't come up with a good example.

~~~
skorgu
It depends on your definition of 'very large scale'. Certainly any large
financial/insurance company is going to be churning through great gobs of data
that really does have to be universally consistent. This is basically the
niche that keeps IBM's mainframe division in business.

I'd suspect that if those don't qualify as 'very large scale' to your
definition (which would be entirely justifiable) that such systems don't exist
because they're not currently possible. For example I don't think it's
possible to implement Facebook with SERIALIZABLE no matter how much money you
throw at the problem.

(I'd love to be proven wrong of course)

------
mobileed
I can't understand why anyone who is in the know would sign up for this. I
work at a fortune-5 corp where Oracle was once king and is being replaced with
Microsoft SQL simply due to the outrageous price gouging. It's as if Oracle is
trying to squeeze every last penny out of it's aging database as OSS solutions
chip away at its profits. PostgreSQL is, in my opinion, poised to do this.
MySQL just pales in comparison to Postgres and Oracle is kitchen sink and then
some -- even at a fortune 5 we barely use all the "features" in Oracle.

I'm just astonished that a company like Oracle being around as long as they
are could be so dumb. At the fortune-5, Oracle has a similar practice of
gouging us on Peoplesoft licenses due to, in my opinion, lost DB sales.

Charging a customer a license by CPU core is just unethical.

It's a no wonder... Go riddance

