Sure it's not impressive if all they have to do is append a 140 char message to a flat file or DB table. It's all the other manipulation that's interesting.
Knowing how little activity twitter has really makes them look incompetent in light of the service outages they experienced over the last year. Either Ruby on Rails is really really non-performant or the twitter code-monkies got the system architecture wrong.
I say "probably" because I'm an outsider guessing, but I'm reasonably confident about that statement. Just not quite 100%.
(In the colloquial sense of 100%, of course; I'm not mathematically 100% sure Twitter exists.)
summary: "Rails and Ruby haven’t been stumbling blocks... The performance boosts associated with a “faster” language would give us a 10-20% improvement, but thanks to architectural changes that Ruby and Rails happily accommodated, Twitter is 10000% faster than it was in January"
They're dealing with duplication, consistency, searchability, etc. across distributed storage systems and a variety of service mechanisms (more than just protocols). The 1 000 000 follower user has every message duplicated and cached at multiple layers up to 1000000 times. The message can show up on the web, API, or across any of the protocols, and it has to persist.
They're not incompetent code-monkies, they just guessed way low when they designed the architecture. It looks like the big move was from a CMS model (not at all unreasonable for a "microblogging" service) to a messaging model. In hindsight, targeting messaging to begin with would've saved them some down time, but wouldn't have been practical in the short term.
From http://gojko.net/2009/03/16/qcon-london-2009-upgrading-twitt... it sounds like the insertion number averages around 9600 messages per second. That's avg follower count * avg tweets inbound, and that's only the input.
Edit: real, solid numbers are hard to come by. I suspect it's proprietary.
Assuming that is true, which I have trouble believing it is, it sounds like they need some help with normalization. I understand the tradeoffs, but it just seems crazy.
Select tweet.* from tweets inner join tweeters on tweeters.id = tweets.tweeterid inner join followers on followers.followerid=loggedinuserid and followers.followee=tweet.tweeterid
Yeah, okay, I know there are performance problems with joins, but there are performance problems with 1,000,000 inserts as well and you could cache the list of followees and do an "in" statement such as: select * from tweets where tweeterid in (cached_comma_separated_list_of_followee_ids)
or whatever to improve select performance.