

How Instagram Feeds Work: Celery and RabbitMQ - thikonom
http://blogs.vmware.com/vfabric/2013/04/how-instagram-feeds-work-celery-and-rabbitmq.html

======
wuliwong
They are overstated the naivety of the "naive" approach.

The article states:

'directly fetch all the photos that the user followed from a single,
monolithic data store, sort them by creation time and then only display the
latest 10'

That isn't how the query would work. It is implying that the query would
return all the results, clearly it would only return 10. I haven't read all
the details, I'm sure Instagram does something better then this basic SQL now
but it is silly in a technical article to overstate the problem in such an
obviously incorrect way.

------
carbocation
In short, the feeds are heavily denormalized and are constructed when new
photos are added by people you follow, rather than at request time.

They seem to favor using disk space and saving on processing time and memory.

~~~
mattdw
Which was where Twitter had to go in their first major refactor, as well.
Originally timeline views would be built on-demand, but that approach proved
unscalable very quickly, so they moved to preemptively building timelines by
copying tweets everywhere they needed to go at publish-time rather than
request-time.

It's basically just an inversion; instead of the cache being a read-through
layer over the database, the database becomes a write-through layer over the
cache.

------
megaman821
It seems that these problems would be better solved at the database level. It
is just that the open source databases lack good implementations of
materialized views and data replication to slaves. Instead you have this more
fragile solution involving the Postgres, RabbitMQ, and Redis to accomplish
nearly the same thing.

------
acoyfellow
This post is great.. but its from April 2013. I wonder how their architecture
has changed since then?

~~~
SuperKlaus
They've since moved from EC2 to Facebook infrastructure, see their blog post
here:

[http://instagram-
engineering.tumblr.com/post/89992572022/mig...](http://instagram-
engineering.tumblr.com/post/89992572022/migrating-aws-fb)

------
lobster_johnson
Were they only using two RabbitMQ brokers? I have a habit of running one per
machine, which allows every app to simply connect to localhost; but on
virtualized clusters, RabbitMQ gets a network partition almost every other
day, and having just two would obviously cut down on the number of partitions.

------
alttab
If they are using EC2 hosts, I wonder why they don't use AWS SQS/SNS for their
queue systems and Dynamo for their materialized views.

~~~
adamnemecek
Probably due to lock in.

~~~
dclusin
Isn't the cost of those services also a function of the number of requests?
E.g. they bill on a request basis, not an hourly cpu time basis?

