

The Technology Behind Convore - conesus
http://www.eflorenzano.com/blog/post/technology-behind-convore/

======
abstractbill
_Finally, Python may not be the best language for this real-time endpoint.
Eventlet is a fantastic Python library and it allowed us to build something
extremely fast that has scaled to several thousand concurrent connections
without breaking a sweat on launch day, but it has its limits. There is a
large body of work out there on handling a large number of open connections,
using Java's NIO framework, Erlang's mochiweb, or node.js._

I wrote justin.tv's chat backend, in Python, using the Twisted network
libraries. It has scaled to peaks of more than half a million concurrent chat
connections, on 8 fairly modest commodity servers. Python is more than capable
here, with the right networking approach. Feel free to ask me anything about
it.

~~~
powdahound
I'd like to second this. Although Twisted's documentation and 5-year-old bugs
will have you cursing the libraries name at times it really does perform well
once you figure it all out.

We're using Python + Twisted for our XMPP servers at HipChat
(<http://www.hipchat.com>) and neither have let us down. Hopefully we can
reach 500k+ connections on such a small number of servers as well.

One specific note about Twisted/Python vs node.js: The combination of Python's
yield and Twisted's defer.inlineCallbacks generator makes it really easy to
write and maintain nonblocking code. In node I find it far too easy to get
lost in a sea of callback functions. Here's what you get to do in Twisted:
[http://enthusiasm.cozy.org/archives/2009/03/python-
twisteds-...](http://enthusiasm.cozy.org/archives/2009/03/python-twisteds-
inlinecallbacks)

~~~
hammerdr
With Node.js, you can use defers (promises), as well. As long as you feel that
promises isn't too much mental overhead (obviously you don't), I would
recommend using promises for any sizable Node.js application.

<https://github.com/kriskowal/q>

------
siculars
Congrats on the launch! I've actually found myself using Convore more often
than I thought I would. Particularly because I'm often in places that block
irc.

Have you looked at distributed counting in Cassandra for your counting needs
yet[0][1]? Great info on its development and use at Twitter. It seems like you
too have lots of interesting things to count. Your initial choice of Postgres
for everything was a bit interesting. Your problem seems like a perfect fit
for a hybrid solution (which you are already implementing by way of redis)
that I think more and more companies will come to embrace by hook or by crook.

Continued success!

[0]General nosql at Twitter and a fair bit on Cassandra:
[http://www.infoq.com/presentations/NoSQL-at-Twitter-by-
Ryan-...](http://www.infoq.com/presentations/NoSQL-at-Twitter-by-Ryan-King)

[1]Specifics on Rainbird, the counting system at twitter built on Cassandra:
[http://www.slideshare.net/kevinweil/rainbird-realtime-
analyt...](http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-
twitter-strata-2011)

[cross post from the blog]

~~~
ericflo
Yeah, I've been following that work for quite a while, it's really impressive!
I'm a big fan of Cassandra in general (in fact, I wrote an example application
to help teach beginners about it: <https://github.com/ericflo/twissandra>).

The decision to go with PostgreSQL over Cassandra (or another distributed
system like Riak or HBase) was simply because it gave us the most flexibility
to change our product quickly while we operate at low scale. And if I'm honest
with myself, we're operating at very low scale right now.

In the future as we scale up, Cassandra's distributed counters will be one of
the first places we look.

~~~
saurik
Are you willing to say how many simultaneous users you are handling at this
point? (I obviously totally understand if you aren't.)

~~~
ericflo
Unfortunately I can't give out any exact numbers :(

~~~
saurik
I realize you may also not want to answer this, and that's also alright, but I
figure I may as well ask, as I really am curious (and maybe someone else will
answer from the perspective of their company): why not? Whenever anyone has
asked me for a stat on Cydia, whether it be active devices over some time
range, daily revenue, costs related to <insert-subproject-here>, or what have
you, I tend to break out a quick SQL query and provide an exact answer
(assuming the question is of something I an measure: not all are); is this
stupid of me? I've noticed a ton of companies refuse to disclose numbers, and
I've always assumed that the result will be that anyone listening will just
assume "ok, so almost none then" unless you give them a good answer and back
up how you calculated it, but does this actually "hurt the cause"? (I do not
have the benefits of the years if startup experience that you have access to
by being a part of Y Combinator, so I try to get advice whenever I can ;P.)

~~~
axod
The top 20 public groups topped out at about 250 users last night, add on a
bit for the other public groups and a little for private groups. Maybe 400
concurrent users max so far?

Just my guess, but I'd be surprised if it's orders of magnitude different.

------
krakensden
Sort of a random query, but why are you using Celery with Redis? Last time I
looked into it, the documentation basically said "you can, but you should
really use AMQP".

~~~
ericflo
Because we already had redis installed, and I like less moving parts rather
than more. So far, it's worked out just fine.

~~~
asksol
Also, from 2.2 on Redis support is complete. It's not as reliable as AMQP (no
message acknowledgements, and you can lose minutes of messages when not using
append_only mode)

------
swanson
Kind of surprising to not see node+socket.io in there, but it is nice to see
some python projects (Celery/Eventlet) doing the same job. I'd be curious to
see if they end up swapping that out if/when scaling becomes an issue.

~~~
weixiyen
Swapping to node.js doesn't solve any of their scalability problems. The
problems they have to solve with scaling their current system will be
identical to how they will have to scale stream servers written in node.

~~~
Encosia
I only skimmed the article, but my understanding was that they're using long
polling to connect the clients to the message queue. In that case, using
node.js _and_ socket.io (as swanson suggested) could considerably reduce the
number of concurrent connections since clients with WebSocket or Flash support
wouldn't need to hold connections open waiting on publish events.

That approach isn't unique to node.js, of course, but the combination of
node.js and socket.io _would_ be a valid way to improve scalability [0, Fig.
3] of most any app relying on long polling.

[0] <http://websocket.org/quantum.html>

------
travisfischer
Thanks for the article Eric. I greatly appreciate your transparency and
willingness to share. It is a great contribution to the community.

------
pkteison
So it's IRC on the web reinvented again? What's the advantage, why not just
use an IRC client?

~~~
d0m
You can close your browser, go home, re-open your browser, and you've lost
nothing. You can see that as a permanent "screen" irc.

Also, the fact that it's web based give them the opportunity to build the
protocol as they want. For instance, they have the possibility to prettify
snippet of codes, play video, show images (as they are already doing), etc.

It is also a bit different as each "channel" are "group" in convore where you
can have multiple topics. So basically, for the django community, it's like if
you had #django-performance, #django-host, #django-debug, and on and on. So,
the second you join a group, you can start new topics.

So, basically, I know it is possible with IRC if you stretch it.. for
instance, building your own irc client or hacking with mIRCscripts (Been there
done that). Also, you of course can change the IRC protocol and host it
yourself.. But then, convore just come with all that for free with a beautiful
web-based interface.

Note also that I feel it's more serious as you need to login with your
facebook/twitter which means less trolls.

The founder of convore really liked IRC.. so if you want, it is IRC+Twitter
2.0.

~~~
pjscott
You don't need to log in with your Facebook/Twitter account; I was able to
sign up just fine with an email address. It's just one of the authentication
options they provide. (It also lets them import your contacts from those
services, which is handy for people who are big into Facebook or Twitter.)

------
sliverstorm
Grah... I was very interested at first, because out of the corner of my eye, I
thought this said "... Behind Corvette". Curse you, disappointment.

