Mike from Instagram here. We've now locked it down more (the actual admin contents were always properly protected).
We're also part of Facebook's bug bounty whitehat program (https://www.facebook.com/whitehat/bounty/), if anyone comes across something in the future, we welcome responsible disclosure and pay out bounties through the program as well.
Off the top of my head I can't speak to Facebook's program, but Google's terms include "substantially affects the confidentiality or integrity of user data". A login page that shouldn't be exposed is a marginal finding and might not qualify.
Also: obviously I don't speak for either Facebook or Google, but I strongly recommend against brute-forcing login prompts to try to prove the point that an exposed console is a real finding.
Mike Krieger (Instagram co-founder) here. Just wanted to offer some clarification since there's some speculation about the reason & scope for the verification mentioned in the article.
When we receive evidence of a violation of our site policies, we respond. This isn't a recent change, but the way we've run our community from the beginning. In some specific cases, for verification purposes, we request that people upload a government issued ID in response. This is the case for a very small percentage of accounts, and doesn't affect most Instagram users. The ID is used only for account verification, and not retained permanently.
We've had to split munin across three masters (by machine role) because the graphing job was just locking on IO. Munin 2.0 moved over to all-dynamic CGI graphing, but I haven't gotten the chance to play with it yet.
Those are really useful numbers--I think a lot of it can be chalked up to virtualization, but we should definitely explore more around IRQ pinning for queues. Any good starting points / reading, are you mostly using taskset?
They are, but we software-RAID our EBS drives to get better write throughput, and we put the Write-Ahead Logs (WALs) on a different RAID from the main database, so when you have both of those going on, you need something else to atomically snapshot our PG databases.
To clarify: we did do a fair amount of capacity planning and elastic ramp-up/pre-allocating of our infrastructure, but no prediction is perfect, so the blog was about diagnosing and addressing issues as they crop up under unprecedented load.