I wrote justin.tv's chat backend, in Python, using the Twisted network libraries. It has scaled to peaks of more than half a million concurrent chat connections, on 8 fairly modest commodity servers. Python is more than capable here, with the right networking approach. Feel free to ask me anything about it.
We're using Python + Twisted for our XMPP servers at HipChat (http://www.hipchat.com) and neither have let us down. Hopefully we can reach 500k+ connections on such a small number of servers as well.
One specific note about Twisted/Python vs node.js: The combination of Python's yield and Twisted's defer.inlineCallbacks generator makes it really easy to write and maintain nonblocking code. In node I find it far too easy to get lost in a sea of callback functions. Here's what you get to do in Twisted: http://enthusiasm.cozy.org/archives/2009/03/python-twisteds-...
As a side-note, this is why I love the software engineering community: everyone's always so willing to share knowledge and help each other out. Let's hope we never lose that as a community!
Whatever language you choose though there's plenty of interesting scaling problems to work on ;)
That said, Tornado is a really neat piece of software, and it worked very well for FriendFeed and others. It really just came down to personal preference.
Do you need to make async calls to the database because your queries are slow?
Yeah, the documentation is awful. Fortunately I've pretty much always just assumed that all documentation is going to be bad, so I pretty much never even bother trying - I just read the source.
Have you looked at distributed counting in Cassandra for your counting needs yet? Great info on its development and use at Twitter. It seems like you too have lots of interesting things to count. Your initial choice of Postgres for everything was a bit interesting. Your problem seems like a perfect fit for a hybrid solution (which you are already implementing by way of redis) that I think more and more companies will come to embrace by hook or by crook.
General nosql at Twitter and a fair bit on Cassandra:
Specifics on Rainbird, the counting system at twitter built on Cassandra:
[cross post from the blog]
The decision to go with PostgreSQL over Cassandra (or another distributed system like Riak or HBase) was simply because it gave us the most flexibility to change our product quickly while we operate at low scale. And if I'm honest with myself, we're operating at very low scale right now.
In the future as we scale up, Cassandra's distributed counters will be one of the first places we look.
Just my guess, but I'd be surprised if it's orders of magnitude different.
Scale up first. Use redis or memcached to store 'overview' stats. If you lose those stats, recalc from Pg.
While we're here, here are some PG tips:
1. Make sure your PG install is fully optimized. I've seen countless postgresql.conf's at major companies with no opts what so ever. Take a look at this book: http://www.2ndquadrant.com/books/postgresql-9-0-high-perform... .
2. Make sure you're using PG 9+.
3. Use pgbouncer for connection pooling and try to write your app so you can use 'pool_mode = transaction'.
4. Please don't run PG virtualized.
That approach isn't unique to node.js, of course, but the combination of node.js and socket.io would be a valid way to improve scalability [0, Fig. 3] of most any app relying on long polling.
Also, the fact that it's web based give them the opportunity to build the protocol as they want. For instance, they have the possibility to prettify snippet of codes, play video, show images (as they are already doing), etc.
It is also a bit different as each "channel" are "group" in convore where you can have multiple topics. So basically, for the django community, it's like if you had #django-performance, #django-host, #django-debug, and on and on. So, the second you join a group, you can start new topics.
So, basically, I know it is possible with IRC if you stretch it.. for instance, building your own irc client or hacking with mIRCscripts (Been there done that). Also, you of course can change the IRC protocol and host it yourself.. But then, convore just come with all that for free with a beautiful web-based interface.
Note also that I feel it's more serious as you need to login with your facebook/twitter which means less trolls.
The founder of convore really liked IRC.. so if you want, it is IRC+Twitter 2.0.
That said, I haven't really been using it simply because the web interface is so slow.