require('http').globalAgent.maxSockets = 64;
at the top of node script if you want a fair comparison with async php version. The bottleneck is bandwidth here. Not the runtime.
On my laptop, original script from the author took 35 seconds to complete.
With maxAgents = 64, it took 10 seconds.
Edit: And who is downvoting this? I just provided actual numbers and a way to reproduce them. If you don't like how the universe works, don't take it out on me.
There is a default connection pool of 5 requests.
If you have 5 or more extant http requests, any
additional requests will HANG for NO GOOD REASON.
> agent.maxSockets: By default set to 5. Determines how many concurrent sockets the agent can have open per host.
This stops you accidentally overloading a single host that you are scraping. It would not (assuming it works as described) affect your app if you are making requests to many hosts to collate data. Many applications (scrapers like httrack for instance) implement similar limits by default. If you are piling requests onto a single host but you either know the host is happy for you to do that (i.e. it is your own service or you have a relevant agreement) or have put measures in place yourself to not overload the target then by all means increase the connection limit.
NodeJS v0.10.21 + Cheerio
NodeJS v0.10.21 + Cheerio + 64 connections
PHP 5.5.5 + ReactPHP + phpQuery
Considerably quicker! As I said I was sure NodeJS could go faster, but the point of the article was that PHP itself is not just magically 4 times slower, it is in fact almost identical when you use almost identical approaches. :)
> Update: A few people have mentioned that Node by default will use maxConnections of 5, but setting it higher would make NodeJS run much quicker. As I said, im sure NodeJS could go faster - I would never make assumptions about something I don't know much about - and the numbers reflect that suggestions. Removing the blocking PHP approach (because obviously it's slow as shit) and running just the other three scripts looks like this:
Also, that seems like a bit of a magic flag to add / tune, why is that not the default, and would I have to keep tuning it for each of my apps?
Neither the original benchmark nor the response were well researched IMO. This is the Apache vs IIS wars again, where good benchmarks that reveal useful information were drowned out by the noise of a great any poorly executed (or sometimes completely biased and deliberately poorly constructed), with bad test resulting in a bad result for one side being followed by an equally bad test to try prove the opposite.
The point was that NodeJS and PHP were pretty close, and I posted (before the update) that I'm sure Node could go quicker.
You run either of them in suicide mode to RUN ALL THE CONNECTIONS and you'll get a speed up. The point is that NodeJS is not magically 4 or 5 times faster than PHP, they're about the same when you the packages you use support the async approach. This update proves they're exactly the same, but similarity is all I was going for.
Why would that be acceptable, but not a simple config change for NodeJS?
Is this something safe to raise?
If so, just crank it up, should be safe unless you assign Infinity or something like that and push it too much (then you have another problem though). We use 15 in production where our server parses a lot of external web pages.
The ulimit will prevent you from opening up too many anyway. The HTTP Client is not the correct place to implement job control and queueing with such a low limit by default.
So, I'm stress testing our company Node app to find where we can go with it's performance. First problem was file descriptor, which I fixed with your "graceful-fs" module.
But now, I'm reaching some "invisible" limit that I can't identify. My app doesn't return any error in the log.
Does "maxSockets" will help to receive more requests also or is just to make requests?
And about the client requests, hmm ok, no problem. I will keep looking. Thanks!
And it's being a great exercise anyway.
We are using Redis for session store. Do you thing raising this parameter can influence on Redis performance? Because in the Redis server, the service is only using 5% in process, when Node.js is 99%.
If you use it smart its safe enought
However you want to know what this benchmark proves? Absolutely nothing as it has to query a website. So the response time of the website matters more then this test.