

Benchmarking Codswallop: Node.js vs. PHP - there4
http://philsturgeon.co.uk/blog/2013/11/benchmarking-codswallop-nodejs-v-php

======
eknkc
Node benchmark is flawed though. Add something like

require('http').globalAgent.maxSockets = 64;

at the top of node script if you want a fair comparison with async php
version. The bottleneck is bandwidth here. Not the runtime.

On my laptop, original script from the author took 35 seconds to complete.

With maxAgents = 64, it took 10 seconds.

Edit: And who is downvoting this? I just provided actual numbers and a way to
reproduce them. If you don't like how the universe works, don't take it out on
me.

~~~
filipedeschamps
I didn't know about this maxSockets limitation.

Is this something safe to raise?

~~~
eknkc
Default is 5. Should be just fine if you don't have a specific use case that
would require higher limits.

If so, just crank it up, should be safe unless you assign Infinity or
something like that and push it too much (then you have another problem
though). We use 15 in production where our server parses a lot of external web
pages.

~~~
IsaacSchlueter
The new default in master is Infinity. (Also, there's opt-in KeepAlive that
actually keeps sockets alive even if there are no pending requests.)

The ulimit will prevent you from opening up too many anyway. The HTTP Client
is not the correct place to implement job control and queueing with such a low
limit by default.

~~~
filipedeschamps
First things first, IsaacSchlueter in the thread, wow :)

So, I'm stress testing our company Node app to find where we can go with it's
performance. First problem was file descriptor, which I fixed with your
"graceful-fs" module.

But now, I'm reaching some "invisible" limit that I can't identify. My app
doesn't return any error in the log.

Does "maxSockets" will help to receive more requests also or is just to make
requests?

~~~
ilaksh
What performance problem with FD? How many requests exactly are you handling
per second with what code or processing? Are you sure its not what he just
mentioned, ulimit? The docs say client requests so yes its just to make them.

~~~
filipedeschamps
I'm handling from 4k to 10k simultaneous requests.

And about the client requests, hmm ok, no problem. I will keep looking.
Thanks!

~~~
ilaksh
Why do you need to handle 5k requests simultaneously? Maybe you mean per
second? Maybe you can add a server and do round-robin DNS? Then you will he
able to do double unless there is a database bottleneck or something. But you
said concurrent which 4k really concurrent is asking a lot.

~~~
filipedeschamps
Currently, we are with a C# application and in the process to migrate to
Node.js. I wan't to prove that Node.js can handle huge concurrency, so I wan't
to benchmark the highest possible numbers I can. We are currently usign AWS to
load balace.

And it's being a great exercise anyway.

We are using Redis for session store. Do you thing raising this parameter can
influence on Redis performance? Because in the Redis server, the service is
only using 5% in process, when Node.js is 99%.

------
gopalv
Long term php guy (I maintained APC for years, slowly given up now), so I've
worked a lot with ~2k/3k request-per-second PHP websites.

The real trick here is async processing. A lot of the slow bits of PHP code is
people not writing async data patterns.

If you use synchronous calls in PHP - mc::get or mysql or curl calls, then PHP
absolutely sucks in performance.

Nodejs automatically trains you around this with a massive use of callbacks
for everything. That is the canonical way to do things - while in PHP blocking
single-threaded calls is what everyone uses.

The most satifying way to actually get PHP to perform well is to use async PHP
with a Future result implementation. To be able to do a get() on a future
result was the only sane way to mix async data flows with PHP.

For instance, I had a curl implementation which fetched multiple http requests
in parallel and essentially lets the UI wait for each webservices call at the
html block where it was needed.

[https://github.com/zynga/zperfmon/blob/master/server/web_ui/...](https://github.com/zynga/zperfmon/blob/master/server/web_ui/include/curl_prefetch.php)

There was a similar Memcache async implementation, particularly for the cache
writebacks (memcache NOREPLY). Memcache multi-get calls to batch together key
fetches and so on.

The real issue is that this is engineering work on top of the language instead
of being built into the "one true way".

So often, I would have to dig in and rewrite massive chunks of PHP code to
hide latencies and get near the absolute packet limits of the machines -
getting closer to the ~3500 to 4000 requests per-second on a 16 core machine (
_sigh_ , all of that might be dead & bit-rotting now).

~~~
Osiris
What are best practices for writing async code in PHP?

~~~
gopalv
A lot of extensions expose async modes, use them.

On the extension APIs

curl-multi - [http://php.net/manual/en/function.curl-multi-
select.php](http://php.net/manual/en/function.curl-multi-select.php)

memcached-getdelayed -
[http://us2.php.net/manual/en/memcached.getdelayed.php](http://us2.php.net/manual/en/memcached.getdelayed.php)

mysqli-reap_async - [http://us2.php.net/manual/en/mysqli.reap-async-
query.php](http://us2.php.net/manual/en/mysqli.reap-async-query.php)

postgres-send_query - [http://www.php.net/manual/en/function.pg-send-
query.php](http://www.php.net/manual/en/function.pg-send-query.php)

gearman doBackground -
[http://www.php.net/manual/en/gearmanclient.dobackground.php](http://www.php.net/manual/en/gearmanclient.dobackground.php)

Something like gearman queues basically take the asynchronous processing out
of the web layer into a different daemon. There were things like S3 uploads
and fb API calls which were shoved into gearman tasks instead of holding up
the web page.

Some of the stuff is very design oriented, for instance in most of my memcache
code, there are no mc-lock calls at all - all of them are mc-cas calls. A lot
of the atomicity is done by using add/delete/cas which involve no sleep
timeouts. A bit of it was done using atomic append, increment and decrement as
well.

SQL queries are another place where PHP doing actual work sucks for the web
apps. A bunch of the mysql/postgresql functionality within a lock is actually
moved onto stored procedures, instead of being driven by PHP.

[https://github.com/zynga/zperfmon/blob/master/server/schemas...](https://github.com/zynga/zperfmon/blob/master/server/schemas/top5_functions_30min.sql#L81)

So the code above is horribly written because you can't parameterize table
names or column names in PL/SQL. But that essentially cuts down the
involvement PHP has with the backend's locked sections.

Also a lot of the stats data was flooded onto apache log files instead of
being written out from the PHP code directly using an fwrite.

[https://github.com/zynga/zperfmon/blob/master/client/zperfmo...](https://github.com/zynga/zperfmon/blob/master/client/zperfmon-
client-apache.conf#L2)

This uses apache_note() function in PHP to log stuff after the request is done
& the connections are closed. That gets into the log files as %(<name)n fields
in the access log.

You can see there that every single access log has an associated user, the
HMAC of the request and peak memory usage. All collected at zero latency to
the actual HTTP call.

The thing to avoid though is pcntl - it absolutely messes up all of
apache/fastcgi process management code.

This is not all of what I've done. I am sorry to say some of my best work in
this hasn't been open-sourced & has perhaps been killed since I left Zynga.

PHP backends I built using these methods were handling approx ~6-7 million
users a day on 9 web servers (well, we kept 16 running - 8 on each UPS).

Ah, fun times indeed - too bad I didn't make any real money out of all that.

------
senorcastro
I get sick of these language wars, especially the constant stream of PHP
ridicule that just never seems to end. The positives I try to take away from
all of it is that there are a lot of people that are extremely passionate
about software development and are striving for better tools and ways to
express themselves. I want to believe that through the vitriol encountered in
some of these articles that there are people really trying to improve the
technologies at heart instead of taking part of some kind of programing
language apologetics. In regards to PHP, I think that the ridicule has led to
improvements in the language, but the overall tone in some of these articles
is still a turn off for me.

------
tegeek
I'm sick of all of these generic SPEED benchmarks. Let me tell you some
BIGGEST & REAL benefits of NodeJS where PHP SUCKS.

1\. Takes 1 minute to install on any platform (*nix, windows etc.)

2\. A modern Package Manager (NPM) works seamlessly with all platforms.

3\. All libraries started from 0 with async baked in from day 0.

4\. No need to use any 3rd party JSON serialize/deserialize libs.

5\. And above all, its Atwood's law

"any application that can be written in JavaScript, will eventually be written
in JavaScript".

[http://www.codinghorror.com/blog/2009/08/all-programming-
is-...](http://www.codinghorror.com/blog/2009/08/all-programming-is-web-
programming.html)

~~~
octo_t
Yet there are _no_ decent XML libraries for node.js.

I'd trade decent JSON support for decent XML support every single day of the
week.

And Scala/Java/JVM have already solved the problems you mention above.

~~~
simonw
"I'd trade decent JSON support for decent XML support every single day of the
week."

Just out of interest, why is that?

I work with JSON at least once a week, but it can be months between moments
when I need to work with any XML.

------
ohwp
Benchmarking is very hard because even the same language could show different
results.

For example:

    
    
      for($i = 0; $i < count($list); $i++)
    

vs

    
    
      $count = count($list);
      for($i = 0; $i < $count; $i++)
    

Most of the time benchmarks prove how capable a programmer is, not the speed
of the language used.

~~~
humanrebar
Any decent compile-time optimizer will transform your first snippet into the
second one (or better). Some languages preclude that optimization at compile
time, but I presume that a JIT would also have little problem performing that
optimization.

That is, one could argue that a good language is one that lets developers
ignore trivial changes like this without hurting performance.

~~~
likeclockwork
I don't see how that would work. I'd say that the two snippets describe
different intentions and using one when you want the other is a case of not
saying what you mean.

A function call in a loop condition might have side effects or do something
very unorthodox.

~~~
humanrebar
I guess this is a problem with trivial examples.

Pulling redundant work out of loops is a category of optimization that is
widely used. In many cases, the optimizer can detect a lack of side effects on
$list inside the loop body and perform the above optimization.

~~~
esailija
The canonical PHP implementation is a simple interpreter that cannot even
dream of such optimizations. Hell, it uses unions to handle dynamic typing and
hash tables to represent objects.

The implementation by facebook (HipHop) might have such an optimization
though.

------
disdev
Good test.

At some point, you'd expect these arbitrary this vs. that comparisons to die
off. They haven't, and I'm guessing they won't.

Basically, it comes down to picking the tool that best supports your use case,
or being okay with a compromise. Like the SQL/NoSQL discussions recently...
Use it poorly and you get poor results.

------
idProQuo
I was always under the impression that PHP hate comes from design flaws that
make it easy to make mistakes, not from it being necessarily slower.

~~~
CmonDev
Node.js is also based around a subpar language, so it's not necessarily the
main cause either.

~~~
aaronem
One might argue that Javascript is considerably less sub-par than PHP, though;
speaking purely from my own experience, I've found that writing Javascript
involves a significantly lower probability of the language attempting, at
random intervals, to shatter my kneecaps with a crowbar.

~~~
regularfry
Both languages are painful, and both have a painful stdlib, but JavaScript's
stdlib is _much, much smaller_ , ergo less messy.

------
geerlingguy
I've made a similar observation to the original post — in my case, moving a
bit of functionality from PHP to Node.js gave me 100x better performance:
[https://servercheck.in/blog/moving-functionality-nodejs-
incr...](https://servercheck.in/blog/moving-functionality-nodejs-increased-
server)

But the reason for this wasn't that Node/JS is faster than PHP; it was because
I was able to write the Node.js app asynchronously, but the PHP version was
making hundreds of synchronous requests (this is the gist of the OP).

The issue I have is that Node.js makes asynchronous http calls relatively
easy, whereas in PHP, using curl_multi_exec is kludgy, and few libraries
support asynchronous requests.

The situation is changing, but the fact remains that asynchronous code is the
norm in Node.js, while blocking code is the norm in PHP. This makes it more
difficult (as of this writing) to do any non-trivial asynchronous work in PHP.

------
lukeholder
I agree that the comparisons are often unfair between languages/frameworks,
and agree with everything phil says, but there is a lot to be said for
language level non-blocking constructs.

I am really enjoying reading Go code and seeing how people use concurrency
etc; and they are all doing it the same. When I would read ruby, I would have
to know the particulars of a library like Celluloid or EventMachine which made
it harder.

------
joeblau
The "Thoughts" section was the most informative part of the benchmark which
underscores the way I, when I was working with PHP, operated. When I started
with PHP(2005), the frameworks were terrible, I would cobble together many
random coding examples from stuff I found on the web and just make my own
Framework up. I don't think PHP from a performance standpoint is any better or
worse, but the default examples that you generally see in the ecosystem
provide significantly worse performance. The one thing that Node clearly has
an upper hand on PHP with is the ecosystem. It's a lot easier for a developer
new to the Node ecosystem to hit that Node target than it would be for someone
of the same skill to hit the PHP target in terms of hours spent.

One funny thing is that the ReactPHP[1] site is visually similar to the
Node[2] homepage.

[1] - [http://reactphp.org/](http://reactphp.org/) [2] -
[http://nodejs.org/](http://nodejs.org/)

~~~
girvo
React was originally called Node.PHP or something similar, if I recall
correctly!

~~~
phpnode
[http://nodephp.org](http://nodephp.org) originally

------
wooptoo
Doing async in PHP still feels like strapping a dildo to a horse and calling
it an unicorn.

~~~
erikig
I'm a PHP user/fan and I find this as funny as it is true.

~~~
jbeja
Waow i never tough that i would actually read the word "fan" along side php,
it make my day.

~~~
Spoom
There are plenty (including myself), we just avoid stating it in public for
fear of the instantly appearing anti-PHP trolls that lurk around every corner.

------
dude3
I have used RollingCurl (non blocking CURL) to fetch multiple API requests at
once using PHP. Really easy to implement using a simple class. The example
shows how you could build a simple efficient scraper.

------
alextingle
Finally, proof that you _can_ put lipstick on a pig.

------
ausjke
ReactPHP does look great, hope it will be in stable enough state for
production usage. Also it seems it's only done by one person?

~~~
philstu
Two core devs with a few contributors.

A lot of the components are in production already, it was built by the
original developers to be used in production. It's on 0.3.0 for many parts,
which is no further behind where Node was when people started flapping about
it :)

------
hugofirth
Only a brit would use Codswallop :)

~~~
onion2k
I read the entire article in the accent of a West Country farmer. And it still
made sense. :)

~~~
philstu
[http://i.telegraph.co.uk/multimedia/archive/01603/wurzels_16...](http://i.telegraph.co.uk/multimedia/archive/01603/wurzels_1603475c.jpg)

------
alexyoung
Node isn't a framework.

------
jlebrech
so how does php fare when scraping javascript apps?

------
denysonique
Node.js for web scraping usually is the obvious choice:

Scraping using jQuery syntax such as:

    
    
      $('table tr').each(function(ix, el) {
        names   .push($(el).find('td').eq(0));
        surnames.push($(el).find('td').eq(1));
      })
    

is more familiar to most web developers as opposed to the PHP syntax.

Even if Node was 5x slower than PHP I would still go for Node because of its
easy jQuery syntax.

~~~
jqueryin
Did you bother to read the post? It pitted two different DOM traversal
libraries against each other:

* cheerio ([https://github.com/MatthewMueller/cheerio](https://github.com/MatthewMueller/cheerio))

* PhpQuery ([https://code.google.com/p/phpquery/wiki/jQueryPortingState](https://code.google.com/p/phpquery/wiki/jQueryPortingState))

Both of these use a jQuery-esque syntax, so your comment regarding DOM
traversal in PHP is a moot point.

~~~
tehwebguy
Yeah the CSS style selectors and methods are the same, I assumed he was
referring to the fact that it's all JS.

When you are scraping it's great to be able to do a test run in the browser
console and then just paste the code into your node script without any
language porting.

It's not an argument that it's better or faster or anything than PHP, just
that some find it easier to hack a scraper together in this way.

