
Keeping Instagram up with over a million new users in twelve hours - mikeyk
http://instagram-engineering.tumblr.com/post/20541814340/keeping-instagram-up-with-over-a-million-new-users-in
======
lenn0x
What kind of instances are you guys running for Redis/memcached? I am a bit
surprised on the numbers here, but to be fair I don't do much in the
virtualization world. With low cpu overhead, it sounds like you might be
saturating the number of interrupts on the network card if it's not a
bandwidth issue. Memcache can usually push 100-300k/s on an 8-core Westmere
(could go higher if you removed the big lock). Redis on the other hand with
pinned processes to each physical core can do about 500,000/s. We (Twitter)
saw saturation around 100,000~ on CPU0, what tipped us off was ksoftirq
spinning at 100%. If you have a modern server and network card, just pin each
IRQ for every TX/RX queue to an individual physical core.

~~~
mikeyk
Those are really useful numbers--I think a lot of it can be chalked up to
virtualization, but we should definitely explore more around IRQ pinning for
queues. Any good starting points / reading, are you mostly using taskset?

~~~
rkurian
Why do you guys use both memcache and Redis ? Redis also has LRU cache
functionality.

~~~
lenn0x
Because in-house we have a custom version of memcache. We rewrote memcache's
slab allocator, and for some use cases, is better at memory efficiency than
Redis.

------
sciurus
A slight tangent, since I saw that instagram are using both Graphite and
Munin- Collectd just added a plugin to send metrics to Graphite. You might
want to try it for tracking your machine stats over time.

<http://collectd.org/wiki/index.php/Plugin:Write_Graphite>
<http://collectd.org/>

~~~
ab
Along the same lines, are you doing anything special with munin to make it
fast? We've had performance issues with the RRDs and graph generation that led
us to pipe metrics to graphite with collectd.

~~~
mikeyk
We've had to split munin across three masters (by machine role) because the
graphing job was just locking on IO. Munin 2.0 moved over to all-dynamic CGI
graphing, but I haven't gotten the chance to play with it yet.

------
statictype
Isn't there a risk with EBS snapshots that the snapshot of a live instance
could have been taken while your db engine was in the middle of a transaction
and leave the data in the newly spun instance in an inconsistent state?

Is it that EBS snapshots are engineered to prevent this? Or just that it's not
likely to happen in practice?

~~~
mikeyk
Yes, there is--we take all of our snapshots from a slave, and we stop the
slave before taking a snapshot, then XFS-freeze all drives, then take the
snapshot, to ensure it's consistent.

~~~
nupark2
Are EBS snapshots not block-level atomic? In _theory_ you should get a PITR
image without stopping anything, assuming that:

1) The file system correctly orders or journals operations (I'm not familiar
with XFS, but this is the case with FFS2/FreeBSD, ZFS, ext3/4 journaling,
etc).

2) The database system correctly orders or journals operations, and properly
fsync(s) to disk (which postgreSQL does)

Of course, there's no harm to an abundance of caution with something like
this.

~~~
mikeyk
They are, but we software-RAID our EBS drives to get better write throughput,
_and_ we put the Write-Ahead Logs (WALs) on a different RAID from the main
database, so when you have both of those going on, you need something else to
atomically snapshot our PG databases.

------
terhechte
Congratulations. Really impressive how solid you handled the Android
onslaught.

------
gflarity
We use statsd, graphite, redis and node as well. You might be interested some
of my projects relating to these:

<https://github.com/gflarity/nervous> <https://github.com/gflarity/response>
<https://github.com/gflarity/qdis>

------
peterwwillis
Why use Graphite instead of Ganglia? Ganglia uses RRDs. It's been around
forever, it's fairly low on resource use, it's fast, and you can generate
custom graphs like with Graphite. I actually ended up doing some graphs with
google charts and ganglia last time I messed with it. (Also, nobody has really
simple tools to tell you which of your 3,000 cluster nodes has red flags in
real time and spit them into a fire-fighting irc channel so we had to write
those ourselves in python)

 _"Takeaway: if read capacity is likely to be a concern, bringing up read-
slaves ahead of time and getting them in rotation is ideal"_

Sorry but this is not 'ideal', this is Capacity Planning 101. If you're
launching a new product which you expect to be very popular, take your peak
traffic and double or quadruple it and build out infrastructure to handle it
ahead of time. I thought this was the whole point of the "cloud"? Add a metric
shit-ton of resources for a planned peak and dial it down after.

~~~
pkaler
_Sorry but this is not 'ideal', this is Capacity Planning 101. If you're
launching a new product which you expect to be very popular, take your peak
traffic and double or quadruple it and build out infrastructure to handle it
ahead of time. I thought this was the whole point of the "cloud"? Add a metric
shit-ton of resources for a planned peak and dial it down after._

Paul is nice so we are nice.

Last time I checked, I haven't built a service with +20mm users. I Googled
you. I don't think you have built a service with +20mm users.

Programming is hard. Scaling is harder.

Let's have some empathy here. I bet the Instagram team has parents and
siblings and significant others and friends that they haven't seen in a while.
I bet they have responsibilities that they have neglected to keep the service
up. I'd rather not poop on their head when they are trying to scale their
service by millions of users.

This stuff is hard. Leaving a comment on a news aggregation service is easy.

~~~
peterwwillis
I'm sorry that my comments come off as harsh, but the original line struck me
as so completely basic it's like something you would tell someone who had
never worked in IT. They clarified later that they had tried to plan ahead but
came up a little short, which I can understand; no estimation is perfect.

I have no idea how many users Sportsline had but it was a bunch. Peaks of 64k
hits per second on the dynamic layer, up to 8 gigabits sustained traffic in
one datacenter... it was pretty ugly on firefighting days. I don't mean to
poop on them, but if they're as big as they seem to be I hold them to a higher
standard than a 6 month old start-up fresh out of college.

I agree it's hard. The fact that they were able to handle the traffic they did
with only a small amount of downtime is a testament to the fact that they did
have their shit together (as well they should with the number of users they
had already).

------
olegi
Hello!

Question about quality insta-photos on Android.

I have JPG from SGS2 - <http://kia4sale.narod.ru/insta/01.jpg>

This is <http://kia4sale.narod.ru/insta/02.jpg> instaphoto (Earlybird) from
Android version

This is
[http://distilleryimage9.instagram.com/662ade7483ce11e19e4a12...](http://distilleryimage9.instagram.com/662ade7483ce11e19e4a12313813ffc0_7.jpg)
\- instaphoto from SGS2 JPG but on iPhone 4.

Question: why instaphoto on Android version in blurry?

Thanks.

------
jcastro
What OS are you deploying on EC2?

~~~
mikeyk
We use Ubuntu, running Natty Narwhal. Every Ubuntu version before that had
some (unique) bugs that would bite us under load. Natty has been by far the
most stable.

------
zupreme
Thanks for OpenSourcing Node2dm. I think I'll take that for a spin this
weekend.

------
nboutelier
Im curious to know what kind of EC2 instance they are running the master
Postgresql on and if they've had any write bottle necks. Im using Postgres for
an app, and am worried about running into write issues.

------
EAMiller
What sort of hosting do you use for your main Pg (and Redis) instances?

~~~
itsprofitbaron
They use Amazon EC2 there's some more info here: [http://instagram-
engineering.tumblr.com/post/12202313862/sto...](http://instagram-
engineering.tumblr.com/post/12202313862/storing-hundreds-of-millions-of-
simple-key-value-pairs)

------
andrewdunstan
PGFouine is nice, but it needs a major do-over. It would be good written with
a plpgsql backend running against database loaded csv log files, so that it
could handle huge logs, unlike now.

------
ganilb
I am curious to find out why there was a need to develop your own C2DM server
- what was lacking in Google's C2DM server? I am a C2DM newbie so pardon my
ignorance.

------
rkurian
It looks like you guys use Redis for a lot of different functionality. It
would be great to see an article on how you guys use Redis.

------
8ig8
> We use the counters to track everything from number of signups per second.

Per second... It must be quite a moment when you reach this point.

~~~
canop_fr
To put it in perspective : there are less than 5 human births per seconds.

So if they want to keep their counter greater than that for a long time,
they'll probably have to extend their market beyond humans.

------
jurre
Very interesting read, but doesn't New Relic do all these things for you?
Maybe it's not possible to use with their setup?

~~~
camwest
I'm interested in comparing statsd to a commercial product like New Relic as
well.

-C

~~~
Sujan
Statsd and NewRelic are very different.

NewRelic gives you mainly a predefined set of metrics, where you just have to
install the agent to get them. Then. There's an additional module where you
can send your own set of metrics and display them.

Statsd on the contrary is 'only' a tool to collect and then display metrics.
You have to define everything you eant to measure yourself (or use plugins to
your app).

So these two are definitely related, but better used for different (although
overlapping) jobs.

------
bond
Does anyone has some info on the architecture required to maintain a service
like this? Servers, db, etc?

~~~
akent
[http://instagram-
engineering.tumblr.com/post/13649370142/wha...](http://instagram-
engineering.tumblr.com/post/13649370142/what-powers-instagram-hundreds-of-
instances-dozens-of)

~~~
bond
Thanks.

------
kunalmodi
are you guys sharding redis? or does it all fit in a single machine?

~~~
mikeyk
We have a variety of Redis machines, some of them are in a consistent hash
ring (the ones we're using for caching); some are using modulo-based hashing
(the ones where losing data on adding more machines isn't an option), and some
are just single-node installs.

~~~
cdelsolar
How do you handle write replication? I haven't found any good document on how
to do this (meaning, if the Redis master goes down, a slave should be promoted
to master immediately).

~~~
mikeyk
You can hypothetically use something like hearbeatd to do it; we run every
Redis master with an attached slave and manually failover for now.

For a small team like ours, we prefer solutions that are easy to reason about
and get back into a healthy state (it would take one server deploy to point
all appservers at new Redis master), rather than fully automated failover and
the "fun" split-brain issues that ensue. Of course that may change as we build
out our Ops team, etc.

------
nodesocket
Great stuff, love node2dm, and didn't know about statsd + graphite.

~~~
mikeyk
Thanks! This was my first node.js project, would love any feedback on it.

~~~
nodesocket
Also, any reason you guys are not using MixPanel for storing events as well,
besides of the costs?

------
Sujan
Thought about adding a tool like newrelic.com to your toolset?

~~~
mikeyk
NewRelic's product screenshots look really nice, but 1) the hosted nature and
2) the price were turnoffs. We might revisit it at some point, though--there
are a few features in there that would be awesome to have.

~~~
benologist
I think New Relic saves us money - the amount of optimizations we've done
because of them is just crazy and by now would have saved us a couple servers
and hardware upgrades at least.

Their sales dudes like startups, we were only lightly funded when we started
using them - garth@newrelic could probably help you.

------
drivebyacct2
What percentage of processing power is spent on making me look like a hipster?

~~~
mkjones
Pretty sure that all happens on the client (your phone).

