Hacker News new | comments | ask | show | jobs | submit login

Node benchmark is flawed though. Add something like

require('http').globalAgent.maxSockets = 64;

at the top of node script if you want a fair comparison with async php version. The bottleneck is bandwidth here. Not the runtime.

On my laptop, original script from the author took 35 seconds to complete.

With maxAgents = 64, it took 10 seconds.

Edit: And who is downvoting this? I just provided actual numbers and a way to reproduce them. If you don't like how the universe works, don't take it out on me.

This is the "insane default" that substack goes on a rant about in the hyperquest README: https://github.com/substack/hyperquest#rant

  There is a default connection pool of 5 requests. 
  If you have 5 or more extant http requests, any 
  additional requests will HANG for NO GOOD REASON.
This one trips up people on #node.js constantly and hopefully will be removed very soon.

It's fixed in master already. Test with the latest 0.11 to verify.

Unless this is what has changed I'd say the default is sane and has a good reason. The doc (http://nodejs.org/docs/latest/api/http.html#http_agent_maxso...) states:

> agent.maxSockets: By default set to 5. Determines how many concurrent sockets the agent can have open per host.

This stops you accidentally overloading a single host that you are scraping. It would not (assuming it works as described) affect your app if you are making requests to many hosts to collate data. Many applications (scrapers like httrack for instance) implement similar limits by default. If you are piling requests onto a single host but you either know the host is happy for you to do that (i.e. it is your own service or you have a relevant agreement) or have put measures in place yourself to not overload the target then by all means increase the connection limit.

You're close, but the rub is, every single http request uses the same agent (globalAgent), unless specifically passed an individual http agent or "{agent: false}" in configuration. So it is effectively a global connection pool. This has caused all kinds of performance issues in production applications. It can be easily shut off but it is easy to miss in the docs. The default of 5 has met its demise in 0.11 - the new default is Infinity.

Nobody should be downvoting you, you raise an excellent point and you are of course right.

NodeJS v0.10.21 + Cheerio

real 0m47.986s user 0m7.252s sys 0m1.080s

NodeJS v0.10.21 + Cheerio + 64 connections

real 0m14.475s user 0m8.853s sys 0m1.696s

PHP 5.5.5 + ReactPHP + phpQuery

real 0m15.989s user 0m11.125s sys 0m1.668s

Considerably quicker! As I said I was sure NodeJS could go faster, but the point of the article was that PHP itself is not just magically 4 times slower, it is in fact almost identical when you use almost identical approaches. :)

They mention this in the update:

> Update: A few people have mentioned that Node by default will use maxConnections of 5, but setting it higher would make NodeJS run much quicker. As I said, im sure NodeJS could go faster - I would never make assumptions about something I don't know much about - and the numbers reflect that suggestions. Removing the blocking PHP approach (because obviously it's slow as shit) and running just the other three scripts looks like this:

It think the problem is you are feeding into the same loop. There are also probably ways of making the PHP script go faster still.

Also, that seems like a bit of a magic flag to add / tune, why is that not the default, and would I have to keep tuning it for each of my apps?

The article starts the loop though. If tuning one setting is a black are that an inexperienced dev would miss (which it of course is, I don't disagree at all there), then drawing in a new library to work around the limits (in this circumstance) of file_get_contents() is very much so too.

Neither the original benchmark nor the response were well researched IMO. This is the Apache vs IIS wars again, where good benchmarks that reveal useful information were drowned out by the noise of a great any poorly executed (or sometimes completely biased and deliberately poorly constructed), with bad test resulting in a bad result for one side being followed by an equally bad test to try prove the opposite.

Hold on, why was my response badly researched?

The point was that NodeJS and PHP were pretty close, and I posted (before the update) that I'm sure Node could go quicker.

You run either of them in suicide mode to RUN ALL THE CONNECTIONS and you'll get a speed up. The point is that NodeJS is not magically 4 or 5 times faster than PHP, they're about the same when you the packages you use support the async approach. This update proves they're exactly the same, but similarity is all I was going for.

This is already why the author did though; the original benchmark that he replied to used Cheerio vs. phpQuery, but he rationalized that that was a losing battle anyways, and decided to test Cheerio vs. ReactPHP -- which requires a non-standard PHP extension called libevent.

It's not the phpQuery part that was replaced. It's the file_get_contents() call, which does the http get that he moved to the react framework.

So my point stands... he already deviated from what was originally tested in order to favor PHP.

Why would that be acceptable, but not a simple config change for NodeJS?

I didn't know about this maxSockets limitation.

Is this something safe to raise?

Default is 5. Should be just fine if you don't have a specific use case that would require higher limits.

If so, just crank it up, should be safe unless you assign Infinity or something like that and push it too much (then you have another problem though). We use 15 in production where our server parses a lot of external web pages.

The new default in master is Infinity. (Also, there's opt-in KeepAlive that actually keeps sockets alive even if there are no pending requests.)

The ulimit will prevent you from opening up too many anyway. The HTTP Client is not the correct place to implement job control and queueing with such a low limit by default.

First things first, IsaacSchlueter in the thread, wow :)

So, I'm stress testing our company Node app to find where we can go with it's performance. First problem was file descriptor, which I fixed with your "graceful-fs" module.

But now, I'm reaching some "invisible" limit that I can't identify. My app doesn't return any error in the log.

Does "maxSockets" will help to receive more requests also or is just to make requests?

What performance problem with FD? How many requests exactly are you handling per second with what code or processing? Are you sure its not what he just mentioned, ulimit? The docs say client requests so yes its just to make them.

I'm handling from 4k to 10k simultaneous requests.

And about the client requests, hmm ok, no problem. I will keep looking. Thanks!

Why do you need to handle 5k requests simultaneously? Maybe you mean per second? Maybe you can add a server and do round-robin DNS? Then you will he able to do double unless there is a database bottleneck or something. But you said concurrent which 4k really concurrent is asking a lot.

Currently, we are with a C# application and in the process to migrate to Node.js. I wan't to prove that Node.js can handle huge concurrency, so I wan't to benchmark the highest possible numbers I can. We are currently usign AWS to load balace.

And it's being a great exercise anyway.

We are using Redis for session store. Do you thing raising this parameter can influence on Redis performance? Because in the Redis server, the service is only using 5% in process, when Node.js is 99%.

It's pretty much always safe to set to Infinity or to turn the agent off. This is an anti-feature.

I'd call it a bug. It's fixed in master.

If we really want to get into benchmarks LuaJIT with multithreading is almost 2x faster then both, it took 21 seconds to complete on my computer. And I'm willing to bet that multithreaded C would be even faster.

However you want to know what this benchmark proves? Absolutely nothing as it has to query a website. So the response time of the website matters more then this test.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact