Hacker News new | past | comments | ask | show | jobs | submit login
A response to Node.js is cancer - The Diagnosis (joshuakehn.com)
78 points by kfalter on Oct 3, 2011 | hide | past | favorite | 51 comments

I think a better response to Ted's argument would have been to rewrite the fibonnaci number generator as a forked-off process with a callback that updates the requesting page on completion.

The thrust of Ted's argument (as I saw it, and completely agree with) is that telling people "inexperienced programmers can create high performance systems" is misleading or dangerous without pointing out that you can easily shoot yourself in the foot of you don't understand what is (and isn't) handled "magically" for you.

Using an algorithm that performs poorly in php or python isn't (necessarily) going to magically perform well in node.js's non blocking execution strategy. Abstracting away blocking network or io calls isn't going to help if you don't know (and notice) that your problem is the O(2^n) algorithm you're using.

> He contests that because a shitty Fibonacci number generator performs poorly Node.js is worthless. I contend that shitty Fibonacci generators are shitty in whatever language or framework.

I don't think he contends that. He was just using Fibonacci as an example to show that IO is not the only way to block a process. Although he doesn't go into it, this issue is one thing that you have to weigh when you decide to use eventing instead of threads: can you break up any potentially long running computation to maintain availability?

This response misses the point, which is just that it's trivial to show that you can write blocking code in node. Any program that does significant computation will still block and pause all network IO (without setting up side processes, etc.). "The stupid fibonacci benchmark" is just a cliche way to chew up the CPU.

My main beef with node.js is the hype, to be honest. Many of the people who are really excited about it seem completely unaware that this kind of thing is hardly new. Tcl has been based around an event-loop for ages! Erlang was made in the mid-80s, and has a very mature (and thoroughly integrated) event-loop-based architecture. Also, its focus is on fault-tolerance / error handling in a multi-node system. I don't know what progress Node.js has made there, so far.

Virtually every request made to the server is blocking. The point of a server request is to retrieve/store data, process it, and respond. That's a blocking operation by definition. I can't respond before processing the data. I can't process it before retrieving it. You get the picture.

The neat thing about node.js isn't that it's "non-blocking" ... every major HTTP server out there is "non-blocking" in the same way. The neat thing is that it's JavaScript. I can use the same language on the server side as on the client side. I can use the same libraries, etc. reducing my code footprint. Reduced code footprint = less maintenance, fewer bugs, easier coding.

Indeed. I think that's a big part of the actual reason for its popularity. This "ultra scalable" stuff has helped to draw attention to it, though.

Heh, wasn't VB5/6 also event-driven?

I'm not familiar with it, but that would make sense. GUI systems are often built around event loops.

He totally misunderstands node's reason for existence, and also why it can be awesome.

His preferred infrastructure (once I cut through the swagger) is:

Highly-tuned request dispatch --> multiple processing threads (or processes) --> back out to requesting client

For certain workloads, e.g. long-polling, pubsub or high-number-of-client workloads, unix dispatch overhead is actually very significant. One cannot instantiate 500 python threads on most unix boxen without severe doom.

For these workloads, essentially ones where you need to push text around with a minimum amount of stream processing on top of it, node is totally, totally brilliant. It is crazy fast. Magically so, even.

If you think of node as a lightweight turing-complete dispatch engine, you will be happy. If, on the other hand, you believe that fairies and pixie dust mean you can write blocking, bloated javascript code and that node will solve your worries, then you'll be disabused of that dream quickly.

On the other hand, if you started down the road with node, and realized at some point that you needed to refactor some of your slow code, you could do so easily. Doing the reverse (scaling an alternate scripting architecture) is not always so trivial.

Worth adding that hundreds of lightweight threads is totally doable in Node using node-fibers. For benchmarks, check out the README at http://github.com/olegp/common-node/

I like node.js, but this post completely misunderstands "Node.js is Cancer". The point of that post isn't that node.js is slow; its point is that node.js does concurrency, but not parallelism; i.e., if your code is CPU-bound, it can only serve one request at a time.

It's a bit extreme to conclude that since Node's built-in HTTP server works this way, that it's "cancer." People still use it because it suits their purposes; there's nothing wrong with that.

AFAIK, nothing's stopping anyone from writing a CGI module for Node. It took years for Python to come up with WSGI and Ruby to come up with Rack. People are fed up with CGI and I don't blame Node's developers for excluding it.

> AFAIK nothing's stopping anyone from writing a CGI module for Node.

Indeed, nothing has stopped several people from doing it:

    $ npm search cgi
    NAME            DESCRIPTION                                                   AUTHOR             KE
    cgi             A stack/connect layer to invoke and serve CGI executables.    =TooTallNate
    fastcgi-stream  Fast FastCGI Stream wrapper for reading/writing FCGI records. =samcday           fc
    koku            Node.js bindings for the Mac finance app Koku                 =cgiffard
    nodeCgi         A fastcgi-like server designed to accept proxied requests from a web server and exe
    scgi-server     SCGI (Simple Common Gateway Interface) server                 =yorick            SC

Exactly. This is not about performance, it's about parallelism.

there is a good post about this on stack overflow... that parallelism in nodejs is more of a hack.. http://stackoverflow.com/questions/4631774/coordinating-para...

To be honest, most of his blog posts seem to be inflammatory in nature with a cute picture at the top. I wouldn't be surprised if Ted wrote it merely to get blog views.

I put Ted squarely in the category of "contrarian" rather than troll. I think it's important to have people calling out the Emperor for having no clothes, even if the Emperor is wearing clothes. It makes for debate, which in turn makes for reasoned decision making.

He's abrasive, true, but often has a point. See e.g. "The Blocking Consumer" part in "The Case Against Queues" (http://teddziuba.com/2011/02/the-case-against-queues.html) and "Stupid Unix Tricks: Workflow Control with GNU Make" (http://teddziuba.com/2011/02/stupid-unix-tricks-workflow-con...).

That's how it came across to me as well after I looked through the rest of his blog.

The article has a point, but poorly told. The problem is the following. Consider the following workload:

- 1 request to http://server/fiboslow

- 1000 requests to http://servir/fast-response

The point is that node will not process any of those 1000 requests until 1st one is finished, while any multithreaded/multiprocess server will do just fine, and process those 1000 requests in parallel.

<smug note> It's funny that the same people that criticized Java for its AWT EDT as poorly designed, are not praising the same thing in Node.js ;)

It should be noted that the Python code example runs in 4.038s on PyPy.

Standard Python: 1:07.31 :)

pypy 1.6: 5.0 sec

python 2.6: 58.2 sec

python 2.7: 49.8 sec

python 3.2: 51.4 sec

Nice. I just wrote a followup to Ted's post myself: http://blog.brianbeck.com/post/node-js-cures-cancer

I couldn't believe when folks on Twitter seemed to be buying his "argument."

(Apologies for the self-promotion. I had literally just published mine when I saw this.)

For those post "Yes, it def is cancer" without spend several minutes doing a test themselves, yes, Node is a cancer... cause you cannot resist cancer.

Not sure whether pointing out the slowness of other languages is a considered argument to make here.

Yes, there are other ways of blocking apart from IO. IO tends to be the main culprit, our web apps are mostly waiting for something. A database connection, a file system, another http request.

By making that asynchronous means we don't waste CPU time waiting. We let those external systems do their thing and when it's done, then the rest of the code runs. The important thing is that the IO isn't the bottleneck in node.js. It can do something else while the IO is doing it's thing.

This means that node.js is better at dealing with IO-laden processes better than the typical gamut of web frameworks.

CPU-bound processes are going to block unless they are performed outside of the main event loop. Dziuba is pointing that out, without offering up the obvious approach.

The approach to dealing with CPU intensive tasks is to delegate it to something that can be asynched out. If your platform of choice is multi-threaded, spin up a thread and run it there.

The node.js way, as I understand it, is to use IO to offload that intensive process somewhere else (at least until web workers is bedded in and ready to use). Since the IO is non-blocking, node.js doesn't consume much resources in waiting around for a response from the server/framework dealing with the CPU intensive activity.

The more I use it the more I see node.js as a pipeline connector between IO resources. Those other IO resources can either be other frameworks, or separate node.js instances that do one small job well. (So an IO process could just be a separate node.js instance that performs a CPU intensive task. In this way it doesn't affect the main request recipient in receiving more incoming requests).

One multi-core server can have a dozen or more instances of node running, each doing their specialised tasks and talking to each other asynchronously via IO.

Sure, it's not beginner level stuff. Even Dzubia himself didn't point out the better approaches to his naive solution - in node.js or any other language. node.js is as bad as every other framework when it comes to naive implementations of recursive algorithms. But offloading the calculation out of the main event loop thought non-blocking IO is a different solution that node.js offers. That's one key differentiator.

It would be interesting to see Dzubia demonstrate the implementation of his concocted problem in the framework / language of choice.

I had the pleasure of missing Ted's original post. I read it before this one, and it was pretty misinformed. I could write a similar article arguing that assembly is useless because it takes so much code to implement a web server using it.

If your server runs in a single thread AND you do expensive calculations per request, you're doing it wrong. You should be caching results, and you need to either move the processing work to a backend server (written in C or something) or shard the frontend across a lot of cores (multiple processes / webworkers). In any of these configurations, nodejs will work great on the front end.

But, most of Ted's post was needless bile. The argument he made isn't justified by the evidence he gave. Don't bother trying to argue with him, its not worth your time.

So, this may be a silly question, but what exactly are you doing that's going to use so much freakin' CPU?

Situations where blocking is going to be an issue:

1. You have roughly equivalent CPU usage per request, which means that you just have too many requests. Threaded or evented will both get bogged down here.

2. Most requests use a small amount of CPU, but every once in a while a request will require an ENORMOUS amounts of CPU. You want the one person with the crazy demand to feel slow, but everyone else to be unaffected. How often does this happen, exactly?

Most modern web apps spend the majority of their time talking to databases, which in NodeJS is a nonblocking operation. If you need to steamroll your CPU frequently, then perhaps NodeJS is not for you, but I don't think that's a common use case.

I don't think using, say, 500ms worth of CPU is all that unusual of a need. If you try to do that with node.js, everything in your program will be blocked completely for 500ms. It doesn't make node.js useless, or 'cancer', but it is a real limitation.

A billion front-end calculations to display a webpage sounds like a hell of a lot to me. If that is common, then you can only do 2 reqs/sec/core either threaded or evented. (point 1).

If the 500ms request is rare, then occasionally some requests will be delayed vs a threaded setup. Keep in mind, that V8 is about 1 order of magnitude faster than CPython/PHP.

There are nodejs web worker implementations to help with that kind of problem:


If you use webworkers for cpu-intensive tasks (like fib), nodejs will perform just as well as all the other web frameworks out there.

But it does not fix the issue of e.g. having used a quadratic algorithm which is breaking your server because in production a user is shoving an order of magnitude more data than you expected (or tested for): you used that algorithm in-request because it was fast. Now it's not fast anymore and your production is getting killed. That's all there is to it, until you fix your code the application is not degraded, it's DOS'd every time that user does something.

And if you do consider workers (because you don't have a better algorithm), what happens for the cheap version? Do you offload it to a worker as well, potentially incurring a spin-off cost greater than the cost of the computation itself, or do you end up with two different codepaths (one sync and one async, just to ensure you're getting as complex as you can) depending on the computation's expected duration?

Just offload it to a child process/worker, or use a job queue.

How exactly would you avoid that 500ms block in other platforms?

The platform does it: other requests are handled by other threads or processes (pooled or not), one request is going to consume 500ms and the rest will keep on trucking concurrently. Unless you've gone above the capacities of the machine itself, other requests will be little to not affected.

just as in node where you run multiple processes against the same socket using something like https://github.com/LearnBoost/cluster

That's not avoiding it. The thread in question will still block for 500ms. The same holds true for a multi-process node app.

The point is that in a multi-thread architecture, you can still serve requests on the other threads. In a single-process non-blocking IO architecture, blocking CPU calculations will block everyone.

Big deal. Use workers to solve that. End of story.

would you mind giving an example of how you would code it so that that 1 request didn't block the rest?

in a browser I'd go with a setTimeout or setInterval hack to "fork" another thread. is this the same in node?

In Node Web Development [Packt], the author uses a Fibonacci algorithm as an example that could block the event loop.

It outlines one solution where the calculation is split into callbacks dispatched through the event loop, making use of process.nextTick()

Frogbugz, Joels bug handling system, has a feature that uses a Montecarlo simulation to provided statistics for when you will be done with whatever software you are working on.

Is there a reason that these examples are using some sort of deeply nested call stack? Is it just to enforce 'slowness' in the function calls?

  def fibonacci(n):
    a,b = 0,1
    for i in range(0,n):
     a,b, = b,a+b
  return b
(cadged from zacharyfox.com) performs far, far better than the nested function calls. I imagine it would in javascript as well. In python at least, all that function calling infrastructure is relatively expensive.

> Is it just to enforce 'slowness' in the function calls?

Yes. He was just illustrating a point about blocking on the CPU.

It's just a canonical form; it's far easier (IMHO) for a beginner to understand a recursively written Fibonacci number computer than your function.

I hear that. In this case, though the entire blog article was complaints about slowness, or blocking behavior as seen through slowness.

I strongly disagree with what Ted wrote, but I really wish he would come out and enter a dialog with the community. There's a thread on the mailing list, twitter, and blog responses floating around now, but no substantial effort by Ted to respond to or even acknowledge the replies he's received.

he's responded here: http://news.ycombinator.com/item?id=3065098 with the same attitude he displayed in his original blog post. Not sure there's much point entering a dialog with him.

When you wrestle with a pig, you both get dirty, but the pig likes it.

Dziuba is the very paradigm of an ill-informed lazy troll. He has made his reputation giving other lazy people reasons to not learn new things.

Asynchronous behavior is far from a panacea. Sometimes it is the right tool for the job... But for a lot of problems, all it's going to do is make your code more complicated.

You made a good point: not many people are aware of the fact that the implementation of fibonacci number by Ted is just wrong.

Of course it's completely irrelevant to the actual debate at hand.

Poor flower doesn't get it -- Node.js is a cancer, and Ruby is dead.

It's remarkably easy to scatter a few process.nextTick() calls in long running functions to play nicely.

I'm going to defer to exarkun on how Twisted does this: http://twistedmatrix.com/documents/current/web/howto/web-in-... Note that his example doesn't bother to chew CPU, but you could chew CPU if you like, as long as you do it in a thread, and still best Node.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact