
Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years - aespinoza
http://highscalability.com/blog/2013/4/15/scaling-pinterest-from-0-to-10s-of-billions-of-page-views-a.html?utm_source=feedly
======
RyanZAG
I'm fairly amazed by some of the choices - in particular the single huge user
table that is hit on every profile page view to map url to shard. I'm guessing
it has hot backups and similar - but wow - talk about a single point of
failure. I guess having it like that actually makes it easier. If it fails, it
should be trivial to get it back up quickly on a backup machine.

The whole setup seems a lot simpler than nearly every 'web scale' solution
I've heard discussed.

EDIT: I'm not saying that is a bad thing - it's a good thing. Most of the
articles around scaling on HN often deal with very complex Cassandra style
technical solutions. This article is the opposite with a very 'just do it the
simplest way' feel to it. One big mysql table for all users (probably with
some memcache in front) is the sort of thing you can put together in an hour
or two - yet it is the very core of their system.

I (and I bet a lot of other engineers reading HN) try to focus too much on
perfect implementations - if someone asked me to design the shard look-up for
something as heavily hit as this, I'd probably reach for a distributed in-
memory DB with broadcasted updates - and I'd probably be wrong to do so. The
mysql/memcache approach is just going to be simpler to keep running.

~~~
coolsunglasses
You don't really want to get into the alternatives. It's pretty common to
shard everything but the core user data (email, PBKDF2, username) and scale
that with slaves and vertical upgrades. SSDs can take you a long way if you're
fastidious about offloading everything else off the machine responsible for
keeping user accounts consistent.

The alternatives aren't nice.

~~~
aespinoza
I agree. Usually it is better to scale vertically on the user data. Even if it
seems painful. The alternatives like partitioning are not really that easy
either. There is pain involved as well.

I think, of the existing solutions, one is not better than the other. The
choice in the end is which one fits best with your team and infrastructure,
and specially which disadvantages are you willing to put up with.

------
cagenut
I really like how when you look at it, most of their scaling was _removing_
things from the environment and really honing the platform down to what it
needed to be.

------
jacques_chester
> _Architecture is doing the right thing when growth can be handled by adding
> more of the same stuff. You want to be able to scale by throwing money at a
> problem which means throwing more boxes at a problem as you need them. If
> you are architecture can do that, then you’re golden_

My understanding is that the term of art is horizontal scalability.

------
sontek
How does SOA improve scaling? I've never had to build anything as big as
pinterest and it seems interesting to me.

I don't quite understand how it reduces connections. If you have your front-
end tier connecting to your service tier, you'll still have the same amount of
requests going to the database, just with a little middle-man... If I
understand correctly?

~~~
aaronbrethorst
I'm no expert, but hopefully by abstracting out different services from your
core app and making them available through some connection protocol (Thrift, a
message queue, http, whatever) you can more easily scale up and scale out
systems that become hotspots, and consumers of that service will never be any
the wiser, since they only interact with it through the published API.

Looking at it another way, when I talk to the GitHub API, I have no idea what
sort of database I'm ultimately talking to. Their interface abstracts away all
of that complexity. Each request they receive from me might be handled by a
different machine for all I know. It makes no difference.

------
jnankin
2 questions: 1) What do they do with the Pinterest-generated ID? Do they store
it with each row in addition to the local ID? 2) Why randomly select a shard?
Shouldn't you do this based on DB size across all boxes, or at least based on
physical space left across all boxes?

~~~
yashh
We decompose 64 bit ids to shard_id, type and local_id. We connect to the
shard go to the table and read the local_id.

We randomly select a shard for a new user only. Ofcourse we capacity plan this
and open new shards and even out users eventually.

------
bloodredsun
What is most interesting to me is that in many ways they have reinvented
Couchbase. I think that the only reason they didn't go with this technology
was the financial cost for their scale was too high.

------
nasalgoat
I wonder if their dislike for Cassandra is based on previous versions pre-2.0.
From my perspective of looking at it as it stands now, it's pretty compelling.

~~~
bloodredsun
Not if you want strong consistency. Cassandra's performance sucks in
comparison with the likes of MongoDb or Couchbase when reading with strong
consistency since the clients have no idea of the server topology.

~~~
cnlwsu
umm what? Cassandra is just as fast/faster (depending on both configurations
and load) compared to MongoDB with consistant read/writes. Definitely with
writes but reads get tricky.

~~~
bloodredsun
Firstly, these sorts of applications are always going to be more read heavy so
the reads are more important. Secondly, Cassandra cannot and will not be as
good as something like Couchbase since the client libraries are not aware of
the server topology so they cannot make direct connection to the server
hosting the data. This means that depending on your consistency requirements,
Cassandra will be merely okay to occasionally terrible depending on whether
you care about 99th percentile. This behaviour was one of the reasons my
company moved away from Cassandra

This is probably the best benchmark of Cassandra/MongoDb/Couchbase
[http://www.slideshare.net/renatko/couchbase-performance-
benc...](http://www.slideshare.net/renatko/couchbase-performance-benchmarking)

~~~
pkolaczk
This benchmark is pure marketing. Cassandra clients can be token aware and can
do connections directly to the right node.

If you want a scientific, independent, peer-reviewed NoSQL benchmark, read
this: <http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf>

Cassandra is a clear winner here.

~~~
bloodredsun
No, the benchmark is not pure marketing. Why would you claim that it is? Apart
from Astyanax, which clients are token aware?

That paper is very useful so thanks for posting the link but it has a number
of issues as I see it.

1) It considers Cassandra, Redis, VoltDB, Voldermort, HBase and MySQL. It does
not cover either MongoDB or Couchbase.

2) Latency values are given as average and do not show p95/p99. In my
experience, Cassandra in particular is susceptible to high latency at these
values.

3) Even considering average values, the read latency of Cassandra is higher
than you would see with either MongoDB or Couchbase.

4) Cassandra does not deal well with ephemeral data. There are issues while
GC'ing large number of tombstones for example that will hurt a long running
system.

The long and short of it is that Cassandra is a fantastic system for write
heavy situations. What it is not good at are read heavy situations where
deterministic low latency is required, which is pretty much what the pinterest
guys were dealing with.

~~~
pkolaczk
It is marketing, because Couchbase is a featured customer of Altoros, the
company that did the benchmark. And the rule of thumb is: never trust a
benchmark done by someone who is related to one of the benchmarked systems.
Obviously they'd not publish it if Couchbase lost the benchmark. They must
have had been insane to do it.

Another reason it is marketing is because it lacks essential information on
the setup of each benchmarked system. E.g for Cassandra I don't even know
which version they used, what was the replication factor, what consistency
level did they read data at, did they enable row cache (which decreases
latency a lot), etc.? Cassandra improved read throughput and latency by a huge
factor since version 0.6 and is constantly improving so the version really
matters.

------
anderspetersson
great talk, inspired me to simplify the tech stack.

Pyres looks like a good alternative for Celery in the Python world when you
don't need Celery's cron-like features.

------
dlhavema
Very cool, I thought pinterest came out of nowhere and is now everywhere, but
i didn't realize how fast they did it and how large they've grown.

thanks for posting!

------
fotcorn
Does someone now what programming language and framework they use? Still
Python+Django? PyPy?

~~~
goronbjorn
From what I've read, it's Python and "heavily modified" Django. I recall one
of their engineers saying that, if they were to do it again, they would've
gone with a more lightweight Python framework because of how many heavy
modifications they made.

~~~
misiti3780
I have read that also (multiple times actually) and the statement never made
sense to me - of course in hindsight you would start with something like Flask
because you ran into all of the scaling problems with Django, but there would
be no way to fix these problems before they occurred - and if you did try to
fix them before they would occurred, it would be premature optimization. It
seems like a example of what Taleb's calls the "Narrative Fallacy"

<http://cij.inspiriting.com/?p=364#>

------
jnankin
what happens if 1 shard contains a whole slew of large companies (macy's, gap,
etc), or over time users on one shard grow to the point that they cannot be
contained within one physical machine? Are they just assuming that this will
be very unlikely?

~~~
yashh
No not at all. We already have this problem. Right now we are able to take the
DB hit but we will eventually attack this problem from cache perspective. We
will have replicated caches for this scenario. In worst case if a shard is
overloaded we stop creating new users in it and put it on a dedicated physical
machine. I dont think we will let the situation get there.

------
garydarna
Great article, thanks for sharing.

------
dmourati
More specifics about the decision to dump MongoDB would be extremely useful.

------
rikacomet
A simply awesome article, that I would refer to back and again, can someone
please refine this with his comments as well. I mean it would be really
helpful to point out any mistakes Pinterest did, at any point, what I want to
know is, what could they have done better. Also, business is a dynamic
environment, so somethings might have changed, so can someone please also
point out, if something was right when pinterest did it, but now there are
better alternatives. Thanks a lot in advance. I would be really reading many
of the comments properly.

~Rika

