> Since Node.js runs on a single thread, everything that will block the event loop will block everything. That means that if you have a web server with a thousand connected clients and you happen to block the event loop, every client will just...wait.
Read more at https://www.airpair.com/node.js/posts/top-10-mistakes-node-d...
This kind of thing would be big enough for me to just recommend to most average developers to just stick with PHP or whatever they are using now. At least the one-thread/process-per-request shared-nothing architecture successfully mitigates the effects of unavoidable developer stupidity (or of "code you've pushed to production without review after more than a couple beers" if you want to put it more 1st person...).
Simply because: 1. bad code will inevitably be written, 2. bad code will inevitably end up, among other things, blocking the event loop, and 3. the application will need to keep working speedly, form the user's perspective, despite having bad code sprinkled in it.
Isn't there any "automagic" way to prevent this from happening with node? ...something like, if a request takes more than XXX ms, than at least start handling new requests in new threads?
I don't think it's fair to blame developer stupidity. Formulating something asynchronously makes your code quite a bit harder to read, test, etc, so it's only natural that developers would avoid that - and builtins like JSON.parse further encourage that. Note that the author of that post doesn't really give solutions for the problems he mentions, he just tells you how to detect it.
Meanwhile there are actually nice solutions for the problem of handling many concurrent computations with low overhead readily available (e.g. Go).
You can use Node's cluster API[1] directly or use a wrapping module such as Recluster[2]. With these, node will spawn a few children that get replaced when they die or time out (a little extra code is needed for the latter).
One can even do rolling upgrades with this for zero downtime releases.
If there is a known workload that will block for a while, one could 1) run the work in a child process, 2) consider using streams to process the data in chunks, or 3) break the work up by chaining smaller operations using "process.nextTick" to put the next operation at the end of the event loop (allows other pending operations to resume before continuing the slow/blocking work).
This is one of the problems Erlang solves: its internal scheduler keeps things running even if some code ends up in an infinite loop or otherwise misbehaves.
Infinite loops aren't necessarily a form of misbehavior in Erlang. It is how you keep Erlang processes up after all - just spawn a function that calls itself at the end. Though the vast majority of those have a receive statement so that they're not always running.
If a dev really writes code that blocks the event loop, is not tested and not reviewed, I'm not sure he should work on professional projects ...
Anyway with one nodejs process it's not possible, but you can detect a stuck node and use some load balancing with a cluster of node processes (2 is usually enough), and simply reload a process if it gets stuck (while sending an email to ops of course). You need to use such an architecture to get zero-downtime deployments in any case .
a low priority admin page renders a 3000 row, 30 col table on the server using react. The query and resulting page size are pretty small < 2MB but this takes 5s for react to render.
I didn't expect it to be this slow and can't use client side rendering.
Those type of frameworks are great, but they all have performance limitations. The typical solutions are either find a faster framework or write it by hand. If the issue is that it is blocking the event loop rather than the literal performance, then you could spawn a process to do the React work.
You could break it up so that it processes the 3000 rows in chunks of say.. 500, and use process.nextTick() between each chunk to ensure node is handling other events too.
That would block a server for 500ms six times which is still unacceptable let alone being a hack that would mean rewriting the react page to support partial evaluation.
Your suggestion is basically to write your own event loop and write any cpu bound task to manually yield to the single-threaded node event loop. Crazy given we have been using multithreaded servers for decades that would do this automatically and generalise for any cpu-bound task.
"You're too stupid to handle this, just stick to PHP" - really?
If a developer can't grasp Node well enough to avoid blocking the event loop, I certainly wouldn't want them trying to handle a thousand connected clients with PHP!
Anyway, to answer your question - within a single node process no, not really. If something on the event loop goes into an infinite loop, timeouts will be of no use, and all other open requests are gonna be toast. This is just a downside to cooperative multitasking.
Some problems might be easier to solve in one language than in another. If you can't solve quicksort in ASM, does that mean people would be correct in wanting to refrain you from solving it in Python? It's a bit grotesque but completely analogous to your objection.
It is the opinion of many that cooperative multitasking is not the way to go if you want to solve problems which require massive concurrency.
PS: In Racket, you could defend against threads going into infinite loops with custodians.
Rad list - I'm going to use your 1.2 automatic browser restarts via SSE trick!
Re: 1.1 automatic server restarts (for crashes etc) there's no need for forever / supervisord / nodemon on current Linux: make a .service file for your app and it will automatically restart if it crashes, on all major distros.
Or better yet, use PM2 instead of forever. As someone who has spent years with forever, I was ultimately happy to drop it into the trash after discovering PM2.
This will show logs intertwined just like forever, merge logs from seperate forked processes and you can even throw in watch rules like nodemon.
pm2 start process.json
Then to make all of your processes always restart do this:
pm2 save
pm2 startup centos
Replacing centos with your distro. Couldn't be easier. There are a lot of other features I don't use (such as deployment, which I suspect will be significant for some big websites), but definitely consider dropping forever/supervisord and even nodemon in favor of PM2. They merge pull requests quickly it seems, too.
The only issue I've had is that for some reason using the "watch" functionality on a lot of files causes massive CPU overload, or at least it did, it may be fixed by now. If you're going to use the watch functionality (to reload stuff like nodemon), consider whitelisting and not blacklisting files.
in the [Unit] section, or you could be in for a bad surprise after a reboot, when your service tries to start before network comes up. Also, I doubt that systemd is available on 'all major' distros.
What's up with node people packaging every function into a module? How can anyone think that a clean build needing to pull in a dozen thousand files and taking twenty minutes is a good thing?
Why have a, say, sha1 module instead of a general crypto one? I've seen at least one module that wasn't more than ten lines of actual code. It's packaging JSON and "tests" were far bigger. It was something trivial that should be in a stdlib.
There are a lot of reasons to break modules into really tiny pieces, even if they are only a dozen lines or less. Some benefits:
- a terse and frozen API (like "domready" and "xtend") does not end up with the scope creep and bitrot that monolithic frameworks and "standard libraries" tend to carry
- it encourages diversity rather than "this is the one true way"
- it generally leads to less fragmentation overall (and tighter and more robust apps)
- each piece can be versioned, tested and bug-tracked independently
- once you get used to it and start finding modules you like, it can be incredibly easy to prototype and rapidly iterate with existing solutions. My 100+ modules are in a similar domain (graphics) and my efficiency for prototyping has improved because of them.
- it is better for reusability. If you have an algorithm that depends on jQuery or another monolithic framework for just a single function, it is hard to reuse. (ie version issues, bundle size)
It took me a while to come around, but npm has really given me a better appreciation for small modules. :)
Just to add to your list, small modules are incredibly useful for browserify/webpack apps. For example, if I just want to md5 some strings and do a `require('crypto')`, I've just pulled about 100 KB of code into my app.
Instead, I can find some simple md5 lib (in this case I used one called "js-md5"), and get the same functionality in about 3KB.
It seems a lot of Node projects have gone the small module direction. Given that npm is a package manager that (mostly) works correctly - which is so much harder than it sounds - we can actually use small modules in our app to little detriment.
Yeah, an npm install might take a little bit longer than you'd like, although it's really not that slow. The duplication of modules isn't really an issue in server-side apps, and in webpack apps you can dedupe code pretty easily with webpack.optimize.DedupePlugin + gzipping, so it's not an issue there either.
That sounds like poor tooling in JS, something solved by say, linkers, in other systems.
And I'm not exaggerating about build times. A simple grunt build doing some basic template stuff would take about 20 minutes. The majority of that time was bringing in the ~13,000 files a rather simple static website needed to build. I ended up tossing the idea of independent builds and just made a persistent build machine that symlinked in node_modules. I've got a million lines of C program that takes less time to fully compile and link.
I would be curious to see what kind of tool could do this. It sounds like "tree shaking" (like dead code removal) which may be possible in the far future with ES6 imports. I imagine it would further add to build times.
Not sure how your rant about grunt tasks and templating relates to small npm modules. A 20 minute build time sounds like something was vey wrong. My browserify (incremental) build time is < 100 ms which I can handle.
Yeah perhaps current JS tools can't do dead code elimination because of the highly dynamic environment? Each module could provide some metadata on what it actually needs. I only target JS via cross compilers so I've never dealt with this problem.
The small module system ends up requiring a ton of files, which is slow. Incremental builds don't really apply to a clean build server where you are basically doing "git clone ... && make". The actual processing isn't my complaint, just the enormous overhead npm's style imposes. I mentioned grunt since just having that plus uglify or so ends up bringing in 13k files or something.
On the other side of the spectrum, you have large Java libraries where you need to somehow figure out which 20 classes collaborate to do a task, and the "documentation" link just drops you into one of those iframe-sidebar javadoc browsers.
That's the lack of design of a library, not really necessarily the size. You could easily have one entry point class with a bunch of ready to use functions and start your docs off there. Otherwise, you're just punting the discovery problem up a level. There's also a benefit in having multiple things in a module. You could make your code generic over the type of hash being used for instance.
As far as node, you could easily export 20 different hash functions for use, instead of a dedicated sha1 package. Then getting yet another when you want hmac.
The downside is nontrivial. In a repeatable build environment it can add tens of minutes of delays to a build as these tens of thousands of files aren't free. (And that was on a VM by one of the big providers, with top notch bandwidth.) Just even doing a local copy of so many files can take a while.
> In a repeatable build environment it can add tens of minutes of delays to a build as these tens of thousands of files aren't free. (And that was on a VM by one of the big providers, with top notch bandwidth.)
Why would you be pulling modules in from an Internet based repo on every build?
> Just even doing a local copy of so many files can take a while.
That means your cloud provider has poor file I/O performance, which is not unusual.
These were great ideas, but blocking the event loop could have been discussed more.
Great suggestions on how to profile, but some examples of how to write code that doesn't block could have been given (for example, spawning a child process to do a continuous set of discrete tasks).
Additionally, node's limits should be discussed. If a specifically intense compute task requires a lot of input and will also generate a lot of output (depending on how frequently this task occurs), that's where node starts to really get clobbered and the solution may be very difficult.
Thanks for the suggestions. Indeed I could have elaborated more on the subject, but I was afraid the article would be longer that needed. I'll keep these in mind for a future one though.
BrowserSync (http://www.browsersync.io) would be more useful than Livereload because it also reloads CSS without refreshing the page and synchronizes navigation if the page is open on several devices / browsers.
Try webpack-hot-loader, you can not only hot reload CSS, but you can hot reload some JS, most notably React components! It's really a treat when you see it work, and it's just one more way to tighten up your dev loop.
I don't think this is very beneficial when your project's complexity grows enough to require a non-trivial build. It's nice to have for simple projects, of course.
> Using control flow modules (such as async)
In my experience the async module creates verbose and ugly looking code. In almost all cases promises are a better solution. Of course, even promises are pretty ugly as it's a "hack" to solve a problem at the language level. ES7 proposal includes async/await, maybe that'll finally solve this problem.
>Not using static analysis tools
I'd also recommend checking out TypeScript and tslint for complex Node.js applications.
`async/await` is only for sequencing, it's not gonna replace anything other than `.then` chains. If you have such chains to the extent that you think `async/await` will solve your problems, you should think very hard if every item in the chain is truly depending on the previous item in the chain because you are currently losing a lot of performance to that otherwise. I have often seen examples using generators and `async await` that needlessly turn fast concurrent promise code into slower sequential code.
You can sequence multiple async operations with promises quite easily by using Q.all (of course this depends on the promise library you're using).
Even if async/await will only replace .then chains, that already fixes most of the issues. Most async code actually consists of .then chains (or just a single async function call).
Anyway, I agree with your point. Developers need to understand how to use async/await correctly.
I wonder if async/await could be implemented for grouping multiple async operations together..
I agree with you in part; there are at least a few entries in the list which would be better described as "pitfalls" than as "mistakes". But "subjective" does not describe every entry in the list; there's absolutely no reason not to lint, test, and profile a production application.
I often see the complaint that you need a tool to restart node after making changes. And here it is as the first one of ten top mistakes which makes me think reading the article might be one of ten mistakes one would make today.
If doing 'Ctrl-C, node file.js' is too debilitating .....
considering every mistake in the list had an accompanying "right way", I don't think it was a critique of Node.js. Some of these might rightly be called "Mistakes Web Developers Make".
Most of the callback spaghetti and event loop blocking stuff is just a critique of Node.js, not developer ability. Most other languages don't have these problems.
It's not a benefit. Most other modern languages are also async, they just don't force the programmer to think about it. Go, for example, is just as "async" when it comes to efficiency as Node.js (more so, actually, since it can use more than one CPU core.)
The whole notion that node.js (or Python's Twisted) makes everything more efficient by putting you in an event loop is just a cop-out for not having something more intuitive like blocking semantics with an evented I/O scheduler.
I wouldn't say "most" - it's basically just Go, Erlang/Elixir, and Haskell that offer blocking semantics with an eventloop under the hood. If you work in Java, the "easy" way is to use threads, and you have to think about it if you want non-blocking IO. If you work in C++, there is no "easy" way, and you have to think about it if you want non-blocking IO. If you work in PHP, Perl, Python, Ruby, or Javascript, they are single-threaded by default but give you basic UNIX concurrency/IO primitives to work with.
> If you work in Java, the "easy" way is to use threads, and you have to think about it if you want non-blocking IO.
Or you use Play/Akka and don't think about threads. Or Scala streams and don't think about threads. There's not much thinking about threads going on here. (A major reason why Go just makes me shrug is that I already have its good bits within easy reach on the JVM if I want them, and I don't have its bad bits.)
> If you work in C++, there is no "easy" way, and you have to think about it if you want non-blocking IO.
YMMV, but I don't find boost::thread_group and Boost.Asio terribly hard.
Also, you omitted C#, which has some pretty fantastic asynchronous tools that you don't have to think about at all.
I have yet to see a project running on a JVM (i.e., HotSpot, not Dalvik) where Scala was impossible to use. I have seen many projects with management that decried its use, but that's a self-caused and self-reparable problem.
All software problems are self-caused and self-reparable: if you want to use Language X and all your infrastructure is in Language Y, you "merely" need to write an interpreter for Language X in Language Y.
When management decries usage, it's almost always because they're running a cost/benefits analysis and the costs of using X (including writing language bindings for code, training programmers on the team, hiring new programmers for the team, context switching between multiple languages, and dealing with bugs and corner cases that have been encountered and fixed by other people in more mainstream languages) outweigh its benefits. Many of these costs are invisible to the engineer who originally proposed using X.
Like I said: Akka's right there. Works in nice, nonthreatening Java.
But the "repair" I was referring to is that a developer can leave rather than put up with conservative silliness if they so choose (and personally, I do, my last gig was a Scala one and so is the next). Java shops are not so rare as to be irreplaceable and any shop seriously worried about hiring somebody who can work with systems that are by now fairly well-understood isn't going to be a good place for any decent developer's career.
Or they have proper syntax to allow wiring in a sane style, letting the compiler generate the callback hell in codegen. In F#, async is just a library. C# has it built in via a dedicated keyword.
One is not even a tradeoff* and the other is completely unrelated to having an all async ecosystem.
*Anything available in other languages in place of callbacks (promises, generators, async await etc) is also available in node. If you are using callbacks you are just ignorant (or don't want to deviate from out of the box features in which case node is the worst possible thing for you). Nowadays there is not even a performance benefit so the only credible reason to use callbacks is not even there.
Developers also have the ability to avoid buffer overflows in C, yet they happen all the time, even for some of the most seasoned developers. This is an indication that there are issues with the language.
Said responsibility is then just not using C when you don't want buffer overflows, and by extension not using Node/JavaScript when you want readable code?
This expression is always evaluated, creating a new string object on the heap that has to be collected, whereas the format string one is only evaluated on demand (matching log level). That's why most logging frameworks go for the latter. I haven't seen benchmarks regarding the performance impact, though.
I don't use console.log when I can. It's better to roll your own logging function which adds timestamps and automatically stringifies the object (preferably using a stringify function which handles cyclic objects).
The prolem with using console.log(object) is that in Node.js the object is not stringified and on browsers the debugger displays the current state of the object, not the state when it was printed. I found that out the hard way..
And when you're using your own logging function you can use a proper logging framework like winston to save the logs to a database and a file in addition to the console (we also send client-side logs to the backend using socket.io for easy debugging).
Why is this post on the front page, what's so special about it?
Read the TOC:
1 Not using development tools
2 Blocking the event loop
3 Executing a call back multiple times
4 The Christmas tree of callbacks (Callback light)
5 Creating big monolithic applications
6 Poor logging
7 No tests
8 Not using static analysis tools
9 Zero monitoring or profiling
10 Debugging with console.log
The author talks about trivial or something that has already been written few times.
It would be interesting to see if a post with the title "Top 10 Mistakes C # Developers Make" and comparable content would get just drop the same attention and encouragement. Maybe if it was written by Jon Skeet...
This kind of thing would be big enough for me to just recommend to most average developers to just stick with PHP or whatever they are using now. At least the one-thread/process-per-request shared-nothing architecture successfully mitigates the effects of unavoidable developer stupidity (or of "code you've pushed to production without review after more than a couple beers" if you want to put it more 1st person...).
Simply because: 1. bad code will inevitably be written, 2. bad code will inevitably end up, among other things, blocking the event loop, and 3. the application will need to keep working speedly, form the user's perspective, despite having bad code sprinkled in it.
Isn't there any "automagic" way to prevent this from happening with node? ...something like, if a request takes more than XXX ms, than at least start handling new requests in new threads?