Hacker News new | comments | ask | show | jobs | submit login
Node.js powers comet for plurk (100k+ users at once) (amix.dk)
87 points by olegp on Jan 30, 2010 | hide | past | web | favorite | 26 comments

I am here to answer questions. I will start by sharing some implementation details:

- we use long polling, but will add WebSocket support once we get more browsers that support it. The code to make it work is shared here: http://amix.dk/blog/post/19489

- we had a smaller comet system in the past that was based on Java + JBoss Netty. It didn't scale that well (used ~10x time the memory node.js solution and had a lot of quirks, like lost connections). Generally thought Netty worked "ok", but I would recommend anyone doing anything serious to look at node.js

- node.js feels VERY natural, but it could be because I had coded so much in JavaScript :)

- I have done a presentation on comet with node.js+V8 to the Taipei Open Source Group which you can check out here: http://amix.dk/blog/post/19484

The technology is there to build next generation web-applications, what are you waiting for? :)

What drove you to writing backend code in JavaScript? Having written several projects (client-side, but minimal DOM junk) in JS, I cringe at the thought of having to use it where it's not required. Some people really seem to like their JS, though, so I'm wondering what drew you to it?

There are two reasons why I picked JavaScript. First, V8+node.js is an amazing runtime system. One should remember that V8 is implemented by a highly skilled Google team that's led by Lars Bak - who was the lead architect for Java's HotSpot VM (the current VM that's used in Java...) So first choice, performance. node.js is also very small in size, so I figured out I could patch if anything wasn't working.

The second reason to pick JavaScript is that it's already geared towards event based programming, because of following things:

   events in browsers and
   closures are a natural part of the language, e.g.
   document.addEventListener("click", function()...
Event based programming is needed to implement highly scaleable non-blocking servers. So JavaScript is a really good match...

And really, JavaScript is a great language and it can emulate almost any paradigm that you may like. node.js also introduces a module system which makes it much easier to manage code. JavaScript isn't perfect and there's tons of bad code written for it, but it's possible to create beautiful code in JavaScript :-)

Cool, thanks

how do you access data storage backend (redis, memcached, your lightcloud and mysql) wth node.js? any suggestions?

Edit: plus, I am confused about the presentation on comparison with Tornado slide. It seems Tornado has higher transfer rate. And after all, they just do a thin layer on top of epoll, what's the penalty here?

Given how young node.js is it has a lot of modules including:

Redis http://github.com/fictorial/redis-node-client

memcache: http://github.com/elbart/node-memcache

MySQL support using DBSlayer

And it's fairly easy to implement other stuff using node.js's TCP module. This said, it could definitely use more modules, but it's a young project and I am sure they will come.

Also MongoDB support: http://wiki.github.com/orlandov/node-mongodb/ (bleeding edge, but is working well for my experimental project)

Do you have any insights/opinions on why node.js beats Netty? It is something inherent in the JVM or is Netty's implementation flawed/not ideal for your scenario?

(E.g. I'm a node.js newbie, but my understanding is that it doesn't have the io thread/worker thread pool, pipeline/handler framework that Netty does. Still reading about it.)

Have you used Thrift w/ node.js?

There seems to be some JavaScript support for Thrift, this would be a great way to have node.js fit in with ones architecture.


Also what are you using for your base libraries?

For example one needs RFC3339 date support, can you just plug in something like Google Closure Library?


i am working on using closure libraries on backend. converted some and seems to be working fine.

Do you anticipate that the use of WebSockets will improve performance? If so, can you estimate to what degree?

Not to pedal my own wares, but... ;)


Also a caveat followup - websocket don't play nice with some proxies/nat/firewall setups. Either need https, or fallback on comet.

I don't want to pry, but I'm really curious - do you work full-time on Mibbit, or is it a side project for you?

Reasonably full-time now, my brother is pretty much full-time on it now also (Not abstractbill, our older bro). Things are going well :)

I think it's always good to have a few things on the go though, so I doubt I'd ever be 'full-time' on any one thing ;)

It will since you don't need to open and close connections all the time. I can't estimate at what degree, but I think the communication will be much faster since currently some time is used to re-create connections.

Also, once the handshake is done, no back-and-forth of TCP/IP headers (which generate a lot of needless bandwidth).

What hardware are you using behind it? eg how much resources are needed to support N users?

1 server (32GB of RAM, quad core). I don't know how much 1 server can scale to, but it's a lot, since a lot of the connections are idle a lot of the time.

How many node JS instances are you running on this machine? I am guessing 4? 1 per core and each one is aimed to handle 50k idle connections? And if you are running more then one, how are you load balancing them? Nginx?

Also I would like to note I saw your Tornado vs Node.JS hello world test and it made me curious because we use Tornado and we really believed that it was the fastest.

So I looked at your results and noticed Tornado had 100x more bytes transfered in your AB test. So I decided to run your same test on a raw Tornado HTTP server (not using the web.py framework) and these were the results of Hello World.

These tests were ran on an 3.66 i7

Tornado (single instance):

import tornado.httpserver import tornado.ioloop

def handle_request(request): message = 'Hello Word' request.write("HTTP/1.1 200 OK\r\nContent-Length: %d\r\n\r\n%s" % ( len(message), message)) request.finish()

http_server = tornado.httpserver.HTTPServer(handle_request) http_server.listen(8009) tornado.ioloop.IOLoop.instance().start()

ab -c 100 -n 4000 Requests per second: 10037.84 [#/sec] (mean)

Node.js (single instance):

var sys = require('sys'), http = require('http'); http.createServer(function (req, res) { res.sendHeader(200, {'Content-Type': 'text/plain'}); res.sendBody('Hello World'); res.finish(); }).listen(8008);

ab -c 100 -n 4000 Requests per second: 9159.65 [#/sec] (mean)

However doing the test the way you did it did show the results you had with Tornado being 2x slower, however I don't believe that was a fair test at all.

Just my input.

Like noted in that benchmark it's non scientific and it should be taken with a grain of salt - - like any other benchmark. It should be said thought that I have evaluated both Twisted and Tornado on a lot of open connections in production and they did not scale (capped on CPU and used too much memory).

Also regarding node.js we run around 8 I think and we plan to introduce more as node.js does not have threads or processes and currently our CPU and memory usage is very low. They are load balanced from our Python servers, but nginx would be a good load balancer as it's non-blocking.

It should be said that I have coded a lot in Python and I love Python - - so I don't have any bias and I would gladly use Python if it could scale.

I think the speed you see as a difference may be coming from how you're writing the data, node requires multiple function calls to do the writing of data (unless you use a raw tcp socket), tornado appears to just take it all in one function call, which takes in a string.

My point was not to say that Tornado is faster, but that his test is far off from proper results.

His tested showed that Tornado was 2x slower the Node.jsbecause he did it wrong, I actually find Node.js rather impressive =)

In your Comet with node.js and V8 presentation, you said that Plurk was originally using Netty to handle 200k open connections. About how much memory was that consuming?

10-20GB depending on the traffic level. node.js uses around 10x less than that - even if it's a much more sophisticated solution.

Are you using hashlib library (http://github.com/brainfucker/hashlib) (I just wondering=)

Did you need some other functionality, and also have you any interest of new modules for node?

node.js is surely cutting-edge. But I am more interested to see any implementations of Google's Go for COMET in large website practice.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact