
Pinterest Architecture Update - 18M Visitors, 10x Growth, 12 Employees, 410 TB - Anon84
http://highscalability.com/blog/2012/5/21/pinterest-architecture-update-18-million-visitors-10x-growth.html
======
pajju
They have a very Interesting Stack as seen on Quora[1]:

-Python + heavily-modified Django at the application layer

-Tornado and (very selectively) node.js as web-servers.

-Memcached and membase/redis for object and logical-caching

-RabbitMQ as a message queue.

-Nginx, HAproxy and Varnish for static-delivery and load-balancing.

-Persistent data storage using MySQL.

-MrJob on EMR for map-reduce.

[1] [http://www.quora.com/Pinterest/What-technologies-were-
used-t...](http://www.quora.com/Pinterest/What-technologies-were-used-to-make-
Pinterest)

~~~
edwinnathaniel
Pardon my ignorance: what is interesting about Pinterest stack?

I think by now most startups, YC or not, in Silicon Valley will pretty much
have a similar setup:

1) Choose the main web-stack (Rails or Django)

2) Choose the API framework (node.js, or something else)

3) Memcached for caching (or some NoSQL)

4) A message queue (ZeroMQ, RabbitMQ, or something ...)

5) nginx, HAProxy, Varnish, (or similar technology)

6) Hadoop for map-reduce (only if you need to go that path...)

~~~
raverbashing
Oh, I _wish_ it was that simple.

You can't mix and match (you can _theoretically_ )

Let's say you began with Django, right? Or there was something already ready
in Django.

And then you begin to see the warts. But ok, you keep churning along.

And the more you churn the more of a specialist in the deficiencies of each
technology you become. Like the fact that Django's ORM runs like a dog.

Or see all the "we moved to MongoDB and we regret it" discussions

From that list, there are two technologies I would recommend strongly (if you
know what you are doing): nginx and Redis

The rest needs to be handled with care. And Redis is very powerful if you know
how to use, but it's easy to go the "lazy way" with it, and it will be fast,
but not as fast as it can.

~~~
megaman821
Yes, but the Django ORM helped them launch faster. There is nothing wrong
using it to start with and then writing your own SQL queries when you need to
scale or even changing your data model to accommodate scaling.

~~~
cookiecaper
I _hate_ the Django ORM and I don't know why Django doesn't just deprecate it
and switch to SQLAlchemy as the new default, which is a much, much better ORM
and used primarily by non-Django Python apps.

~~~
Marwy
Why don't YOU try?

~~~
cookiecaper
Why don't I try what? Deprecating the Django ORM? I've already done so on a
personal level (in fact, I avoid Django wherever possible), but since I don't
control the Django project, I can't deprecate it for everyone, as I suggested
in the parent comment. :)

------
whalesalad
It's nice to see that two companies valued at >= $1 billion are running a
Python/Django stack. #teampony

~~~
lionheart
Yeah, I'm actually in the process of switching to Python/Django from Rails, so
its great to hear how well it scales.

~~~
pajju
Check this list out, Running Python - Django [1][3][4]:

-Instagram[0]

-Pinterest

-Disqus

Disqus.com - Disqus serves over 3 billion page views, and more than 500
million unique visitors a month on it's Django stack. As far as we know we are
the largest installation out there.

-Mozilla[2]

addons.mozilla.com and support.mozilla.com

250k+ add-ons, 150 million views per month, 500+ million api hits per day
(firefox checking for updates!

-Justin.tv(they moved to Django from Rails)[3]

-NASA

-National Geographic

-Canonical

-Bitbucket.org

-Discovery Networks

-Intel, AMD, HP, IBM

-Lexis-Nexis

-The Library of Congress

-The New York Times

-Orbitz

-PBS

-Rdio: Huge traffic Radio site.

-VMWare

-Walt Disney

-The Washington Post.

-lanyrd.com

-OSQA Sites. OSQA is an Open Sourced similar copy of Stackoverflow, a QnA community. AFAIK some 5K sites are powered by OSQA Stack. OSQA is built on Django.

-Youtube, LinkedIN, Google, NetFlix, Amazon: Python Stacks.

-GMail, Google Calendar, AdSense, AdWords, Android MarketPlace to name a few of the heavy hitters from Google are on Python - Google App Engine.

[0][http://techcrunch.com/2012/04/12/how-to-scale-a-1-billion-
st...](http://techcrunch.com/2012/04/12/how-to-scale-a-1-billion-startup-a-
guide-from-instagram-co-founder-mike-krieger/)

[1][http://jacobian.org/writing/django-community/django-
communit...](http://jacobian.org/writing/django-community/django-
community-2012/)

[2][http://reinout.vanrees.org/weblog/2011/06/06/large-
mozilla-s...](http://reinout.vanrees.org/weblog/2011/06/06/large-mozilla-
sites.html)

References-

[3] <http://stackoverflow.com/questions/886221/does-django-scale>

[4] [http://www.quora.com/Django/What-is-the-highest-traffic-
webs...](http://www.quora.com/Django/What-is-the-highest-traffic-website-
built-on-top-of-Django)

~~~
irahul
EDIT: I upvoted the post, and now it's no longer in gray. So, it was just one
downvote.

Why the downvotes? Parent post is just pointing out sites which he/she knows
to be using Python and/or Django, some of which are incorrect(Gmail?), or are
giving the impression that they run on Python when only small sub-projects are
using it(LinkedIn?, Amazon?). There are some mistakes, but on a whole, I don't
see anything wrong with someone pointing out something relevant to the
discussion, even if it involves bragging about something he is associated
with.

------
al_james
Does anyone else feel that 410TB of _user data_ seems quite a lot? If I have
my maths right, even if all the 80 million objects are user data (as opposed
to, say, logs) thats 5.3mb per object. Considering that most pinterest photos
are from the web, that seems quite big.

~~~
moe
Yes, that number is ridiculous.

For reference, that's about four times bigger than the iTunes music catalog
(20mio MP3 files * 5MB average filesize = 100TB).

~~~
skeletonjelly
5MB seems a bit small, they'd also have lossless copies right?

~~~
moe
Yes, their total storage consumption is bigger, but the MP3-part of the
library is in that ballpark.

------
bbayer
Those kind of articles makes me feel so stupid. Even though those technologies
are ready to go, making them work smoothly without any interruption always
seems hard to me. I believe, there has to be lots of tips and trick. ( in
other words you have to be experienced with all them or I am very lazy) I was
wondering if any common receipt exists or if there is someone who can answer
my couple of high scalability questions.

~~~
raverbashing
Yes, see "heavily-modified Django"

(even though this is probably around scalability and connecting with other
technologies)

Scalability and uptime is still hard

Still, Pinterest is laying on a great infrastructure (EC2, Elastic Map Reduce)
and using it to the fullest.

------
bsg75
> Sharding is used, a database is split when it reaches 50% of capacity,
> allows easy growth and gives sufficient IO capacity

Nice, compared to the usual, "The database was at 100% capacity, then we tried
to shard/partition, and it did not go too well."

------
teoruiz
According to the article Pinterest is spending on AWS EC2 >$30k to support 18M
visitors/month.

Data: $52/h (peak time, let's say 18 out of 24 hours) and $15/h (night time,
let's say 6/24).

Edit: as pointed in the comments $30k/month would only be the EC2 costs.

~~~
carson
You are just counting EC2 cost. The AWS cost for 410TB of S3 storage is around
$39k. You would need to add in BW cost on top of that.

It is also interesting that they seem to be using Akamai for a CDN instead of
Cloudfront so not a completely AWS based solution.

I wish they went into what they are storing in S3. 410TB is a lot of storage.
My initial guess was cached images but 80M objects breaks down to 5MB per
object and that is a lot more than what is needed for image caching.

~~~
kitsune_
From these figures they seem to burn over 100k per month on outsourced cloud
services alone. Holy shit.

~~~
tnash
Yeah...that's insane. Since they're a "pre-revenue" company, they're just
burning money.

~~~
moe
"Burning" seems like an understatement.

Looks like a switch to dedicated hardware would amortize within... 3 months.

~~~
teyc
Given their growth rates and cash in the bank, and the fact it is still
looking for a business model, it is probably better for them to focus on their
key problems before working on other issues. If they were self funded, like
37signals, and are running operations where they'd tighten the screws on cost,
then the focus is different again.

~~~
moe
I disagree.

Yes, they obviously have quite a few loose screws (read: whoever invested
100MM in that).

Either way, this is not a matter of "tightening". It's a matter of hiring an
admin and having him not only pay for himself after 3 months, but for 1-2
other employees, too.

Yes, when you have 100MM in the bank then a mundane couple dozen thousand
dollars a month might seem to matter less. But I can't think of a company
where that kind of decadence has led to anything positive in the mid term.

------
rudiger
It's cool that we're seeing companies reach 1 billion USD valuations with 12
employees.

~~~
blantonl
It will be even _cooler_ when we see companies reach 1 billion USD net-income
with 12 employees.

~~~
edwinnathaniel
I heard Microsoft Visio team probably have less than 50 people working on it
(granted the name Microsoft and the Office brand provide a huge selling
point).

Visio almost hit $1B according to insider... (can't verify).

~~~
iusable
No that's interesting! Gots to dig into this little tidbit.

------
ankimal
Small, cohesive goal oriented teams with clear well defined goals can achieve
so much.

------
neovive
Interesting. I'm migrating some old PHP apps over to Python and have been
learning Flask + SQL Alchemy. Why would a site with so much traffic choose a
full stack framework like Django that required so much modification?

~~~
skeletonjelly
Because when they started they didn't have the traffic, quicker to get to a
running product. Probably easier now to cut back/change Django functionality
than rewrite.

