Keep in mind I gave it as an example that is *conceptually* simple to demonstrat...

Keep in mind I gave it as an example that is conceptually simple to demonstrate that scaling it with pretty much any technology is possible. It is still a lot of work, and expensive, and likely far from optimal.

The approach I suggested would create massive message spikes, and there's undoubtably optimizations. E.g. 30 million followers for the top users is a lot, but most of them don't tweet all that often. Perhaps it's better to keep the recent tweet of the top 1000 (or 100,000) users in memory in memcached's and "weave" them into the timelines of their followers instead of trying to immediately stored the in timelines for each user.

The point is scaling is pretty much never a language problem once your needs are large enough that they'll exceed a server or even a rack no matter how efficient the language implementation is.

At that point the language choice is an operational cost vs. productivity issue down to whether or not the CPU usage differences are sufficiently high to cost you enough to offset any productivity advantages you might get from the slower language implementation.

That is a valid concern. It's perfectly possible that Twitter is right in switching in their case. E.g. even a 10% reduction in servers could pay for a lot of developer hours when your server park gets large enough.

If you're on the tipping point where switching to a faster language implementation can keep you from having to scale past a single machine, then it's slightly different. But arguably if you're in that position, you should still plan for growth.