
MySQL Sharding at Quora - jinqueeny
https://www.quora.com/q/quoraengineering/MySQL-sharding-at-Quora?share=1
======
bcassedy
Is Quora really big outside the US or something? I find it difficult to
reconcile their scale challenges with the fact that I don't know a single
person that uses Quora outside of when the come up in google search results.

~~~
technion
I suppose I would reply with the question "is being the top rated Google
answer for all sorts of questions enough?".

That is, I can't imagine anyone I know having deliberately visited Quora and
used it as a networking site. But I can sure imagine most of the non-tech
community searching for a random fact and clicking a Quora link. I'd have to
wonder what that means for traffic/load. Certainly I'd expect you could cache
a substantive portion of page hits.

------
troughway
While this is a great read, I'm excited to find out more about the tech
challenges behind their login wall.

------
JohnBooty

        A couple of years ago, our infrastructure team 
        implemented the ability to move a table from one 
        MySQL host to another. This allowed us to have 
        multiple masters with different tables on different 
        masters, which we refer to as “vertical sharding”. 
    
        Each one of these masters, along with its slaves, 
        is called as a “partition” at Quora.
    

I nearly stopped reading here.

Naming things is hard, but why would they choose a term ("partition") that has
already existed for many decades in the same context?

That's nonsensical.

~~~
Ozzie_osman
What's nonsensical about it? A partition is just a logical split of your data.
You can do horizontal partitioning (aka "horizontal sharding"), like splitting
the same table to different databases by key. Or you can do vertical
partitioning, where you put different tables in different databases.

~~~
jeffdavis
I always thought vertical partitioning meant to split by columns (that is,
have fewer columns per table, and you need to join to get the original back).

~~~
JohnBooty
You're correct!

[https://stackoverflow.com/questions/18302773/what-are-
horizo...](https://stackoverflow.com/questions/18302773/what-are-horizontal-
and-vertical-partitions-in-database-and-what-is-the-differen)

[https://www.datanumen.com/blogs/in-depth-understand-of-
verti...](https://www.datanumen.com/blogs/in-depth-understand-of-vertical-
partitioning-in-sql-server/)

------
gokaygurcan
[https://www.raphkoster.com/2009/01/08/database-sharding-
came...](https://www.raphkoster.com/2009/01/08/database-sharding-came-from-
uo/)

Nice article from Raph Koster. Especially if you played Ultima Online by any
chance.

------
awinter-py
'shard user scale data, centralize or mirror small scale data' (the A1 A2 B C
D E diagram) is a useful point

'obvious in retrospect' I suspect to a lot of teams that add sharding to their
system

------
RoboTeddy
Would Google Cloud Spanner work well for a workload like this one? Why or why
not?

~~~
axegon_
By all means yes, I'd argue spanner would be the ideal candidate for the task.
It scales beyond anything else I've come across. Mind you, so do the costs of
running it. I've only used it for training exercises and numbers add up like
nothing else. IIRC running a production instance at the bare minimum would set
you back around 9k/month.

~~~
boulos
Disclosure: I work on Google Cloud.

That was probably before the change last fall [1] to have an SLA for
single-"node" instances. At $0.90/hr per "node" for a Regional setup in Iowa,
that's just over $650/month. Note that I put "node" in quotes, because it's
hard to remember that "num nodes" != num replicas [2].

We "require" (I don't know if this is a hard requirement, but let's assume it
is) a node per 2 TB of storage [3], so the example for Quora of "tens of TB
without counting replicas" would probably be 15 nodes at 30T and say 50 nodes
at most (100TB).

Assuming 50T as a balance, you'll need at least 25 nodes and let's assume
regional pricing (since the MySQL example isn't going to be multi-regional by
comparison). That's 25 x $.9/hr x 730 hours/month (~$16500) plus 50T of
storage at $.3/GB/month (another $15k). So just over $30k/month or $360k/year.
As for qps, we quote the _multi-regional_ variants at 7k read qps per node (so
175k in this example) and ~2k write (so 50k).

That's all in though, so you'd need to compare to the cost of running a fleet
of MySQL boxes too (at a minimum you'll also need ~75-ish boxes, just like the
25x3 for Spanner), 150T of raw storage (assuming 1 master and 2 read replicas
per shard, which is pretty common) and then Zone to Zone egress (I believe
Quora is on EC2). I'm assuming they're using gp2 volumes at $.10/GB/month, so
that's the same $15k/month of storage. Being cost-conservative and assuming
r5a's and 3-yr RIs, you pay about $18/month per thread, so we can buy up to
833 threads of r5a for the remaining $15k/month. That's maybe close to
feasible (~11 threads per box), but honestly you probably need 16+ or even 32
for traffic this high (depends on query caching).

tl;dr: I don't think Spanner's pricing is higher than the sharded mysql fleet
in this example! I do think it's not available on AWS or on-premises though,
so that's a fine disqualifier for Quora.

[1] [https://cloud.google.com/spanner/docs/release-
notes#Septembe...](https://cloud.google.com/spanner/docs/release-
notes#September_25_2019)

[2] [https://cloud.google.com/spanner/docs/instances#available-
co...](https://cloud.google.com/spanner/docs/instances#available-
configurations-regional)

[3]
[https://cloud.google.com/spanner/quotas#notes](https://cloud.google.com/spanner/quotas#notes)

~~~
exacube
Even at ~$360k/year using Spanner, I wonder if Spanner is cheaper than the
amortized cost of dedicating engineers to develop and maintain a custom MySQL
sharding solution, on top of the AWS bill.

------
roskilli
Since it's range based sharding, was Vitess ever considered (a popular CNCF
MySQL range sharding cluster approach that came out of YouTube)?

~~~
sethammons
Different company. We evaluated Vitess in comparison to ProxySQL. While Vitess
was very promising, it was a large refactor and an all or nothing proposition
to switch to it. ProxySQL allowed us to move to it piecemeal and adopt a
crawl, walk, run solution in adopting it.

~~~
roskilli
Interesting, didn't realize it imposed a lot of client side changes, I'm
curious to hear your experience on that? Was it the change in MySQL
compatibility that makes it an upfront cost?

With Vitess I believe most get started with just deploying VTGate and just
proxying their existing queries to their existing MySQL server, then once that
is working nicely, start to range shard and expand the cluster sitting behind
VTGate.

~~~
sethammons
It was several years ago and I don't recall the details. And it is always
possible we missed something or that things have changed on the Vitess side. I
do recall the impression that we would have to have a "cut over" and it would
be going from zero queries to full load. It seemed doable but leadership
decided the risk was too high.

~~~
sougou
We've come a long way in Vitess in terms of usability. And we're adding even
more tooling to support onboarding and migrations.

------
fovc
> The result is that we do very few joins in MySQL

I realize depending on your data model some of this is unavoidable once you
shard, but it sounds like they did this even before sharding.

What's the reason to do that? Seems like you either end up with long
transactions or doing MVCC in app code

~~~
toast0
As a sibling has noted, joins across database hosts with SQL is somewhere
between hard and impossible. So it made sense to discourage and remove them on
that basis, given that the datasize was growing/has grown beyond what can be
managed reasonably on a single host.

Another reason to avoid joins is that they it can be difficult to consistently
write joins that execute quickly. Often, a poorly performing join can be
written as a sequence of queries which each perform well, and joined with
reasonable performance on the client side. Usually, it's easier to scale the
client (webserver) tier than the database tier, so moving work to the client
makes sense for that reason too.

Of course, this has to be considered in the context of the service. These days
it's pretty easy to get a database server with 64 cores and a couple
terrabytes of ram, and several terrabytes of fast SSD storage. You can do a
lot with that, and you may never need to shard.

------
zerr
Comment from the representative (probably) of the "Chinese Quora" is
interesting as well :)

~~~
jinqueeny
Thanks! Posting the case study here for more information:
[https://pingcap.com/success-stories/lesson-learned-from-
quer...](https://pingcap.com/success-stories/lesson-learned-from-queries-
over-1.3-trillion-rows-of-data-within-milliseconds-of-response-time-at-zhihu/)

------
TekMol
Interesting that they still use MySql and did not switch to MariaDB.

What is HNs opinion on MySql vs MariaDB?

~~~
Mailtemi
Their site(use case) looks almost similar to stackoverflow.com. The
stackoverflow.com has few MS SQL servers with a Memcached as cache. I guess if
reads are the heavy load it makes perfect sense just to continue with your
current SQL DB + sharding.

~~~
ramraj07
Did stack overflow even do any sharding?

~~~
Zenyaku
Here is what stack overflow does:
[https://nickcraver.com/blog/2016/02/17/stack-overflow-the-
ar...](https://nickcraver.com/blog/2016/02/17/stack-overflow-the-
architecture-2016-edition/) It's from 2016, but probably hasn't changed much.
Also StackExchange:
[https://stackexchange.com/performance](https://stackexchange.com/performance)

------
makz
This is going to be very useful for one of the projects I’m working on, thanks
for sharing

