Author here, didn't really expect this to get picked up anywhere but since it has I'd like to point out the idea was to demonstrate that computationally expensive algorithms can be split across multiple iterations of the event loop to avoid blocking it.
In this case concurrent requests take advantage of each others' memoisation, which would be somewhat trickier to do with threads as you'd probably need to worry about locking.
Yes, you too can by required by the compiler to implement cooperative multitasking by hand. In 2011.
Yes. It is an answer to the criticism made, and I acknowledge that. But it is not a very good answer to the objection. The better answer is "don't do that in Node.js", which is still not all that great (it's really easy to accidentally write something that blocks badly), but is better.
Well the other option for computationally expensive code is to use some sort of worker that runs a sufficiently fast language.
> Well the other option for computationally expensive code is to use some sort of worker that runs a sufficiently fast language.
Which only helps if you know your code is going to be slow. If you somehow implemented an algorithm with a quadratic complexity and did not test for sufficiently large input, you might not realize what's going to happen before it hits production.
1. Nobody is denying that.
2. The issue is with the behavior of evented systems in general and node in particular in case of in-request CPU-bound code paths, namely that the whole server blocks killing concurrency and basically DOSing the instance.
I am not quite sure what it is you are trying to say. Node does (in this case) require a bit more hands on approach than most other languages but that is because of a subtile but important difference. Node uses corporative concurrency whereas threads are not corporative. This is both a disadvantage (you are required to do more work and cannot take proper advantage of multiple CPUs) and a huge advantage (no locking is needed and you can share the results between different execution points).
Of course the real issue is that you should choose a better implementation of the algorithm.
Yes. It is an answer to the criticism made, and I acknowledge that.
It's an answer that says: "You're using the wrong tool for the job."
Which is particularly weird given that Fibonoacci itself is probably the most overused example of algorithm-to-promote-paradigm in computer science. Except it's for a different paradigm: recursion, not asynchronous IO.
Still, it's interesting in a recursive sort of way.
Fib is designed to be a piece of code that runs really slowly but doesn't require typing in many lines of code. This makes it a reasonable benchmark for things like "how fast can a function be called", and also a good example of "something that takes a long time".
"it's really easy to accidentally write something that blocks badly"
What is this nonsense? Isn't it time we knock this one on the head? Let me let you into a secret: If you write CPU intensive enough code in any framework you can eventually block all further requests. This whole debate was based on FUD and a complete misunderstanding of what node is all about. Read the previous posts and don't post generic, bullshit comments like "which is still not all that great".
"If you write CPU intensive enough code in any framework you can eventually block all further requests."
"Intensive enough" is a vague and fuzzy term which you can hide too much behind. So let me put it this way: Go grab Yaws, the Erlang web framework. Write a web page that goes into an infinite loop. Write some other web pages that work normally. Observe that visiting the infinite-loop web page once does not cause the rest of the web server to stop serving. Yaws may time it out eventually, too, not clear from a quick look at the docs. In fact it will only marginally decrease performance, even on single core machines.
Yes, if you bash on that page often enough you will eventually degrade service to an unacceptable level. But you will not bring down the whole server, or even that OS process, and it will take substantially more than one hit per process or one hit per core.
Now, go grab Node, and write a web page that goes into an infinite loop. You just brought that OS process down, from the user's point of view.
Node is qualitatively much easier to lock up an OS process with than Erlang. Or Haskell, or Go, or anything else with a modern task scheduler, which is an ever-increasing number of language platforms.
But who deploys node in a single instance? There's documentation all over the place which tells you how to load balance over several instances - and it's easy to do so. This is my point, the node is cancer article was pure troll.
OK. Set up a load balanced infinite loop. Result: A load balanced infinite loop. This is not a win for Node. Load balancing across a number of hung processes buys you very little. (Not quite zero; you get a chance to detect the fact that it's hung and restart it, as long as these pathological requests aren't coming in fast enough. Hope the user who poked the bug doesn't hit refresh too many times!)
I still think you may not understand what modern schedulers end up doing here.
It isn't specific to Node. What's specific to Node is that there's a whole bunch of hype convincing people that Node is the epitome of multitasking, when in fact it's just yet another event-loop based system, subject to the same foibles. The same very well known foibles.
Node isn't a bad technology and I don't hate it. Well, I personally hate working in the event-loop paradigm (due to abundant experience) but that's no discredit to Node, which simply is what it is. The hype is toxic. The hype is basically full of flat-out lies. It teaches people that the state-of-the-art as of 1990 or so is the state of the art today. The hype claims Node is blazing a new path in the field of concurrency, when in fact it's traveling a 4-lane highway with fast food and hotels, while putting blindfolds on its partisans to hide them from the fact they're actually smack dab in the middle of civilization.
you realize that you're just making Ted's point for him from the opposite direction, right?
his point, when you look behind the trolling, is that node.js is not magical special sauce and that shitty coders who write poorly-scaling code will be shitty coders who write poorly-scaling code no matter what technology they use -- the "cancerous" properties of node arise simply because of the amount of groupthink that pitches the Next Big Thing as intrinsically better than everything that came before it.
if I had to sum up his philosophy in three sentences, here they are:
your job is software engineering. every bit of fancy shiny stuff that you add onto your software is another thing that can break. the most important decision you need to make in your job is determining when your tools are good enough and don't need further elaboration -- at a certain point you need to stop jerking off about your toolchain, and just ship your project.
Yeah, some of those blog posts don't make much sense. The queues article, for example. RabitMQ and ActiveMQ are big Java things to keep your sysadmins up at night. ZeroMQ is just a networking library. Yes, they all end in "MQ", but at the end of the day, everything in computing as a message queue. If you don't like it, go back to typewriters :)
Fine. So you pick your tool (Java, Python, Node.js, presumably with an Nginx or Apache front end, though I'm not sure you want to put Node behind Apache), and use it.
I don't know enough about Node to really judge it, but there's nothing I've heard in this whole flame-war that really rules it out.
agreed -- for what it's worth, I've looked into node.js for some of my projects, then thrown it out because I already know other technologies which serve the same purpose. the problem isn't with the technology, the problem is with the marketing (and that includes grass-roots marketing through engineers who swear by the technology as one-size-fits-all.)
It is an old question 'How do we tell truths that might hurt?'.
If you try the well reasoned analysis, you get passed over. It turns out that no-one pays attention unless there is a fight happening (c.f. tech crunch's reporting style)
'If the truths are sufficiently impalatable, our audience is psychically incapable of accepting them and we will be written off as totally unrealistic, hopelessly idealistic, dangerously revolutionary, foolishly gullible or what have you.'
The morale is - everyone admonishes a flame, but nothing else gathers posts quite like it. If you think something is terrible, holding back will get you nowhere.
Personally I've never seen node.js pitched as a solution for newbies or sub-par coders toiling away in the enterprise trenches. I've always seen it marketed as a useful tool for people who know WTF they're doing.
buzz is omnidirectional -- when people start talking about a technology, platform, or stack, people of all shapes and colors will show up and use it. and when your product pitches itself as having super amazing performance due to this programming paradigm omg!, you are responsible for making its limitations known to all and sundry who use it, rather than assuming prior knowledge.
also, the joyent node.js homepage itself claims, as a business advantage:
my point isn't that you shouldn't build up buzz. it's that, when buzz exists around something, your job as a platform implementor includes making people aware of the things your product can't do well, and pitfalls the end-user might run into.
saying "well, it's their own fault for not being clueful enough to know what they were doing wrong, this technology is for pro hackers only!" is developer-hostile.
Being a wee bit pessimistically inclined, I somewhat agree with the assessment of the node.js programmer pool (seems similar to the first migration of PHP programmers to Ruby after the infamous RoR screencast), I find it hard to attribute to this to the developer(s).
Or in other words,
a) what did they do wrong?
b) What do you think would be the best way to "spin" a new platform like that without either being cluelessly elitist or giving the unwashed masses false hope?
disclaimer: I'm not a node.js expert by any means, and most of my day's work is on client code written in system programming languages (C and C++.) that which isn't, is in C#. I don't know of anything in particular that they're doing wrong because I don't really keep in touch with that particular PL community.
that said, my ideal pitch page would contain something to the effect of: "extensive standard library, including robust facilities for concurrent/non-blocking I/O". nothing marketing-y about non-blocking I/O or a given programming model making a language intrinsically "fast" or scalable, like nodejs.org's:
"This is in contrast to today's more common concurrency model where OS threads are employed. Thread-based networking is relatively inefficient and very difficult to use. [...] Almost no function in Node directly performs I/O, so the process never blocks. Because nothing blocks, less-than-expert programmers are able to develop fast systems."
that said, the node community might already be doing things like this which aren't obvious to the layman, and if so, good on them. good concurrent programmers are hard to find, and anything that encourages people to learn how to do concurrency well is a good thing.
Or at least that line of thought makes sense to me.
Really? I've almost never seen node.js pitched to anyone other than newbies (although not in the enterprise trenches for sure).
Rarely the pitch involves being able to share libraries between the server side and browser side (wonderful benefit of node.js)
I haven't used Node.js for anything serious but the hype and especially the discussion around the now infamous node is cancer post sure makes it seem like Biilmann was correct when he said "For better or worse, Node is like the PHP of concurrency and ever so often worse is better"
Even if that's not true, if that demographic is the dominant users of and contributors to Node then that's what Node will become.
I find the notion of sharing libraries between the client and server kind of odd, in that people who talk about the "open web" think it's a desirable quality.
it allows you to rev a protocol faster, but at the cost of not being forced to have a point of reference that isn't also intrinsically tied to one implementation of the protocol, which seems a very un-open thing to do to me.
The libraries I was thinking of sharing are all ones that are tightly coupled to the specific pages they would be on and all related to UI.
There is much deeper integration you could do but I think we're on the same page as to why that's a bad idea.
I think this proves Ted's point. Look at that code. It is ugly as sin. Since I'm not an expert at node.js, I can't follow what is going on at all. Plus, this is memoizing. That is just cheating, it doesn't matter how you implement the algorithm if it memoizes.
If the creator of this github repo is to be believed, implementing a 3 line algorithm in node, you must use 20 lines of intricate, ugly code. If that is the case, node.js IS cancer.
Instead of memoizing in the server you could memoize in the browser and use socket.io to ask connected browsers if they have a memoized value for a given fib number. That would get you around the limitations that bascule mentioned.
I'd like to point out that this uses important improvements over the naive Ziuba's strawman version: memoization and callbacks using the event loop. The callbacks improve concurrency, allowing a single process/thread to multitask requests, and memoization reduces total time spent per request.
Actually, if the async lib memoization facility shares values between requests, which seems quite sure to me, all requests but the first one are served from its cache in the test the author uses as example. This still doesn't highlight any strength of node.js IMO, asides from the easy shared memoization. It could be translated to Tornado-web, an async Python framework, almost literally.
> I'd like to point out that this uses important improvements over the naive Ziuba's strawman version
The Ziuba version was not "a strawman", it was an example used to trivially generate high in-request CPU load and demonstrate the misbehavior of evented system under this condition.
If you want to compute fibonacci fast, just use Binet's Fibonacci formula, it runs in O(1), then your only issue is that you're going to overflow very fast on doubles (not sure you'll even reach fib 1500). Using decimal, you can probably go higher before reaching your limits (though you'll run out of display space long before that if you print all digits): Python's decimal.Decimal reaches ~fib 4000000000 before an overflow error.
Binet's formula doesn't run in O(1) if you want an exact answer. Using a + b * sqrt(5) representations or floating point with sufficient precision, you can get O(n log n 2^O(log* n)) time (quoting Wikipedia's Furer's algorithm page here). Which beats O(n log n log log n) time, if you were wondering. If something beats that for computing Fibonacci numbers, that'd be pretty cool.
It does in fact share memoization. Easy shared state is a key strength of the single-threaded model. :)
Re: Node.js, you're right. The core of the debate is whether the single-threaded, evented model is a legitimate theoretical approach to concurrency. That programming style is the same across Node.js, Tornado-web, EventMachine, etc. It translates literally to Tornado-web because the idioms are the same.
Is it just me, or have all these implementations of fast Fibonacci servers missed the original intent of the 'Cancer' article? As I read it, it was more of a commentary on giving novice concurrent programmers enough rope to hang themselves than on the capabilities of Node itself.
I'm actually very confused by his criticism. You can write a CPU intensive task in any language, and you'll have the same problem.
Or is that the point? Some people believe Node.js will magically make all processing computations = 0? I'm all for discouraging the rumor that Node.js will solve every problem, but don't call it Cancer.
The issue is what happens when you write that CPU intensive task. If you do it in idiomatic Go, that same server will keep on responding to other requests in the meantime. With Node.js, that is not the case. You have to do things like, well, what this article does, to get it to work.
I'm still not getting it. Go might be able to handle an extra request or two for high CPU intensive tasks, but it will inevitably suffer the same consequence of locking all it's processes given enough requests.
Still doing anything that is CPU intense on that layer is stupid in the first place.
A so called 'goroutine' locking up will not lock up the rest of that server. Other pages will still be served by that server. This has nothing to do with the efficiency of the underlying language, it has to do with Go doing cooperative multitasking transparently. It might as well be a separate process (in fact, for all you really know it could be).
The code seen here does cooperative multitasking explicitly. It is essentially explicitly specifying places where rescheduling can take place. The result is the same but with this Node code you are getting your hands far more dirty.
The point of the Cancer article is that since it (allegedly) is not made explicitly clear what is going on, programmers can deal far more damage to themselves than they would with other schemes.
I really don't know how to explain it any better than that.
PS: "cpu intensive" is relative. The fib example is a deliberate exaggeration of what you'd see in reality, to make the point trivial to observe.
Well, that just makes me more confused, because most languages I know handle multitasking explicitly, and they can all be dangerous when programmers exaggerate "cpu intensive" processes. I appreciate you trying to explain this guy's expectations, but I can't help but see this all as trolling FUD.
Many languages offer threads, which either the operating system or the runtime will preëmpt to ensure they all make progress in some vaguely fair way no matter how they behave. I haven't been required to write "I have more work to do, but this would be a good place to interrupt me" for fifteen years (when Win95 displaced Win3).
To answer your (possible rhetorical) question literally, the requirement here isn’t that it generate Fibonacci numbers efficiently, it’s that it implement the recursive algorithm efficiently.
It’s true that another algorithm will be much faster, but that deprives us of the opportunity to discuss how this implementation of the recursive algorithm delivers much higher performance than a naïve implementation.
The back story is that an abrasive blog post asserted that Node.js was a terrible platform for certain types of problems, and this algorithm was given as an example. The author here is simply showing that with care, Node.js will not be the problem.
For shits and giggles, here’s another Fib algorithm, chosen strictly for the pleasure of implementing it:
It’s tangential to this thread, but since it’s a topic close to my heart:
If you just implement this formula using doubles then it only gives the correct answer for input values up to about 70. Yes it’s pretty fast, but a static table of the first 70 Fibonacci numbers would be even faster. If you’re only interested in the first few dozen Fibonacci numbers then it doesn’t really matter what algorithm you use, as long as it isn’t naive recursion.
If you want to get correct answers for larger Fibonacci numbers, you need to work harder to use this equation. You need some sort of multiple-precision floating point arithmetic library, for example.
And once you’ve done that, the resulting algorithm is quite a lot slower than the best integer methods.
That formula fails for fib(71) with double precision floating point arithmetic:
In : def fib(n):
...: a,b = 0,1
...: for i in range(0,n): a,b = b,a+b
...: return a
In : def fib2(n):
...: phi = (1+sqrt(5))/2
...: return round(phi**n / sqrt(5))
In : fib2(71)
In : fib(71)
And it's not that the latter is not exactly representable as a floating point number:
In : float(_)
So you'd have to use bigger precision numbers. How many digits are you going to use? You have to use enough not to produce a wrong result and not too many to slow down the algorithm. I bet that the resulting algorithm is slower than the standard [1 1; 1 0]^n matrix exponentiation. Note that your formula comes from the eigenvalue expansion of this matrix exponentiation, which -- despite what you might hear from undergraduate linear algebra courses -- is not a good method for matrix exponentiation. On the contrary, finding eigenvalues is done by a process analogous to matrix exponentiation: http://en.wikipedia.org/wiki/Power_iteration
That's actually pretty good. Double precision floating point (IEEE format, anyway), can represent all positive integers up to 9007199254740990, which takes you up to Fib(78), so the fact that you can get up to Fib(76) means you are getting close to the top. Good enough for most work.
After 9007199254740990, double precision only represents even integers. Fib(79) is odd, so can't be represented. Fib(80) would be representable though, as will be every other Fibonacci number for a while. Then comes a range where only multiples of 4 can be represented, and then a range with multiple of 8, and so on, so while there will be representable Fibonacci numbers in those ranges, more and more will be omitted.
This is the wrong approach. The right approach is to write your computationally-intensive code so that it runs in another process, and then to talk to that process over the network. Network servers must only talk to the network.
If I have unused capacity, I can afford simpler threaded code. If I'm overloaded, I need some other platform for the real work. It sounds interesting but unless I need to write my own software load balancer, I'm not sure what node (as implemented today) is appropriate for doing.
To which the response is the same: simply don't block the event loop and use setTimeout.
The whole cancer thing is based on the premise that blocking the event loop is unavoidable. I'm attempting to say that it isn't. Jerf's points about how it can happen accidentally are perfectly valid, it's up to the developer not to do something stupid and to actually have decent tests and benchmarks in place - as it would be with any approach.
Its certainly true that significantly poor performance in code which doesn't release the loop will degrade a single-threaded event loop far more than a threaded approach.
CPU speed and bus speed is finite on the machine you're running on. Using setTimeout to poll the results of some thread you've spawned to run a calculation is _worse_ performance than just blocking in the first place, because now when you try to profile your application (how does one do that in node.js, anyways?) you're going to be looking at a million setTimeout closed loops.
Here's Ted's entire point: "The point is that the HTTP server isn't the same entity doing the application work."
..which has basically been true ever since mod_jk was created.
The point is moot. HTTP is just the transport protocol in this case. Doesn't have to say that it has to be the front end service that talks to the end user web browser. The OP of the "cancer" article made up that shit. I can bind the listener of a node service to an UNIX Domain Socket and talk HTTP over that socket with a front end web server, instead of plain TCP (reduces the IPC latency). It is IPC between distinct components. It delegates the dynamic page generation to another service. It is a text interface, although sometimes, well, most of the times, the binary protocols outperform the text based protocols therefore the theory is crap. Isn't this "UNIX way" enough for some people? Forget the part that says: any IPC adds extra latency to the request - response paradigm aka the opposite of the stuff a web stack should do.
Can you tell me at least one example from the myriad xGI protocols for interfacing a web server with an application server that actually brings any benefit over having plain HTTP/1.1 and a reverse proxy? Does it help to have yet another protocol for doing basically the same task: IPC between a couple of servers? Any xGI client (aka web server) is basically a reverse proxy for an IPC protocol. Does it say somewhere, carved in stone, that this protocol must be other than HTTP? I think that most people that spend the time engineering yet another xGI solution simply don't see the forest for the trees.
PS: nobody stops anybody to drop a scgi.js or fastcgi.js module for node.js if that keeps people warm at night. But maybe I'm one of those tired of this shit: engineering yet another xGI protocol for every programming language that floats around the web. NIH syndrome, I'd say. At least HTTP is ubiquitous, while most servers already HAVE a reverse proxy handler.
Edit: On an in-browser test with Chrome I get accurate values up to 75 with the closed form function and accurate values up to n=78 for the recursive / additive function.
You can probably get it work by asking it for increasingly large numbers to break up the calculation into stages.
v8's current GC implementation has a hard memory limit of 1GB which is uppable to 1.7GB on 64bit systems. There's a new GC implementation in the works which is supposed to be able to beat this.
The memory usage is due to the overhead of nextTick creating a new stack each time, and the interpreter keeping track of where the callbacks need to lead. Splitting the calculation up into batches could be incorporated into the algorithm quite simply by adding an additional task to async.series which called fib(i) for i to n with a step of 10 when n is > 100.
Obviously there are much more efficient and effective ways of calculating the fibonacci sequence, but I quite like the recursive algorithm for its clearness and its amenability to memoisation.
But is the async module itself elegant? I guess it's relative, it's even less elegant not to use the module. The async module is still a hack, the cleanest one Node has.
I think the whole discussion about "cancer" wouldn't have happened if the V8 team added continuations by now. They mentioned that they will so we might be cleaning up big chunks of code.
It's funny how a tech can get so popular with these big warts. Probably because people use it in a very limited way, no one writes CPU intensive code because you rarely need it for serving sites and data. Then the clumsy callbacks aren't so bad.