Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: I really don't understand node.js, could someone explain it to me?
69 points by barredo on June 20, 2010 | hide | past | web | favorite | 21 comments
That's it. I guess I try to understand node.js as a traditional webserver where you set up files in a folder and you are going.

Is this the case? How is node.js different from Apache/Nginx/Lighttpd/Cherooke/etc

Thanks.




So let's say you write a web app in Rails or Django. Of course all sorts of magic happens behind the scenes, but ultimately the whole thing boils down to a function, ( def handle_request(request) { return ... } ); which is given the request and returns a response. If you have to wait for something (the database or whatever) that function will have to sit around waiting for it (e.g. "blocking") before returning the answer.

In Node.js, despite however much magic of a framework you have, it utlimately boils down to a function that gets a request and a response proxy ( function process_request(request, response) { } ). The function can do whatever it wants with this response object, by sending data right away, stashing it somewhere to be picked up later, having it send data in response to an event in the future, whatever. Then the function returns, and gets out of the way, and other stuff runs. But the response object is still sitting around, and whenever you tell it to do something, you can send some (or all) of a response back.


What he said... I'll also add, that it's written for Javascript should not be understated - not only is it one of the most widely used and accessible languages around, but I'd wager no other platform has the amount of competitive development that Javascript does - right now - with a slew of engines that improve in performance by magnitudes year after year.

I don't know why Ryan didn't just call it Node. I think most people are thrown off by the .js.


This helps a lot (I've been wondering the same thing about node).

But if you're not waiting for database calls (or whatever) before you respond, what is it useful for?

I'm still a bit confused—some type of example and comparison (say with Rails) would help


Most web servers are bottlenecked on Io and memory. Assuming the bottleneck is that your app servers are starved for more data ( as opposed to trying to push out too much data to clients,) then a big part of your request cycle is waiting. Waiting for db, waiting for memcache, waiting for web service, waiting for disk to return some file, etc. The other thing is that your app servers will gobble ram. This is less true with copy on write, but ruby implementations and rails are usually very memory hungry. Node is not. Node is more like rack than rails, so it isn't fair to compare the two.


No, node.js is not a web server that serves files and folders. It doesn't even have to be a web server. It's simply a collection of libraries written specifically to be asyncronous. It's kind of like Ruby and Python, but with asynchronous libraries and obviously using JavaScript. Asyncronous means non-blocking, so node.js makes use of things like callbacks to prevent your code from blocking. The Wikipedia article would be a good resource for further research.

(This stuff can get pretty complicated and I don't understand everything fully. Please point out things I got wrong!)


Just to clarify: it is a set of IO libraries, for socket, http, and file io. Other people have built things on top of node like db adapters and tempting languages, but at it's core node is focused on io.


The wikipedia explanation...

Node.js is an evented I/O framework for the V8 JavaScript engine. It is intended for writing scalable network programs such as web servers.

So, if you need a web server, you need to build it. But as you can see from the front page of the node.js site, writing a simple web server is... simple. ;)

Otherwise, the easiest way to understand with it is to dig in and play with it. Heroku has Node hosting in beta (invite only) so you might keep an eye out for that. Take a look at howtonode.org for some great tutorials to start with.


You'll get a better answer on stackoverflow.


I'm not sure why people are downvoting you, because you are absolutely right. StackOverflow was built for questions like this; but these questions are not the main reason Hacker News exists.

Pointing the person to a resource that could help him better is valuable. Upvoted.


Just watch Ryan Dahl's jsconf.eu talk. It's great fun and very informative.

http://blip.tv/file/3735944


That one is a lot about Node.js performance. This one is a little better:

http://jsconf.eu/2009/video_nodejs_by_ryan_dahl.html


node.js is basically the JavaScript equivalent of Python's Twisted or Ruby's EventMachine, an asynchronous server toolkit. If the node.js documentation doesn't explain it well enough, look at its "rivals", maybe that helps (well, maybe not in the case of Twisted, if I remember correctly).


When you use Apache/Nginx/etc, you're using a webserver. Usually you will also use a serverside programming language like Python, Ruby, PHP, etc. The webserver is usually programmed using C/C++ and the serverside language is "bolted" on.

With Node.js you have Javascript as a serverside language (like PHP, Python, etc). But included with it you have a library that allows you to create a webserver.

Instead of having a webserver and couple a serverside language to it, you have a serverside language that is able to spawn its own webserver (although you can choose to use Apache or others).

The biggest advantage though, comes from the fact that JS is a functional language with a focus on event based programming. And so Node.js follows that paradigm: you use JS functions that are binded to certain events and run when one event happens, instead of running continously and doing some kind of loop waiting on I/O,CPU, database, etc. This allows for better performance.

Plus there's the advantage of being able to program websites using the same language both client-side and serverside. But thats another matter.

-- MV



I understand the asynchronous part and the advantages associated with it, but what baffles me is that at the operating system level how do they work? If it has to do something asynchronously should it not be mapped to OS threads and if so how do they have a performance advantage over things like Apache which make use of threads directly?


Two reasons usually given for async IO being more efficient than threaded servers are a) a large chunk of memory (on the order of a MB) must be allocated per thread (and thus per client) and b) the overhead of context switching.

Both of these problems can be solved by things like green threads (which get scheduled on a single OS thread), coroutines, fibers, etc, though Node people will claim any sort of "machinery" like that will add unacceptable overhead. But aren't deeply nested callbacks and their associated closures also a form of "machinery", albeit one which must be explicitly managed by the programmer?

The other argument against thread-like abstractions is the introduction of race conditions associated with shared memory. I believe coroutines sort of solve this, as do shared-nothing / message-passing systems like Erlang's "processes" and HTML5 Web Workers (and of course OS processes, though they're too heavyweight).

[just realized the previous 2 paragraphs don't answer your question, sorry]

Anyway, instead of blocking on a single socket operation per thread you can use select() and other more efficient APIs (poll, epoll, kqueue) to wake a single main thread when any of the sockets (out of hundreds or thousands) are ready for reading or writing.

When there is no asynchronous API for some operation (like with many filesystem operations or existing database clients, for example) a thread pool is indeed used to make that operation asynchronous to the rest of the application (however depending on things like the disk cache, context switching might not be negligible, so Node now offers synchronous filesystem APIs as well)


Thanks. Did not know that async APIs at the OS level existed.



Following is what I understood while writing a Tornado like server implementation in PHP. Please correct me if I'm wrong.

-----

If you consider a normal application/process, the computation done there is usually very fast. For example, take a blog post being displayed. Once everything(data) is there, wrapping the content in a template, creating the HTTP requests only takes few milliseconds. However, disk IO, database IO, network IO, etc take a quite lot of time compared to computing. The process is usually idle while waiting for those resources.

Now, consider Apache. It's creating a TCP socket and starts listening. Once a connection is made by a client, it spawns a new thread for handling this connection. But since it's waiting for resources like network IO, the thread stays idle for a while. Now, imagine if you are getting lot of requests - like 1000 requests per second. If finishing one web request(processing+waiting) takes about 500ms, apache will need 500 threads per second. Now, creating a thread contains a certain amount overhead. When the thread count goes higher, it becomes much more slower.

Now, consider Node.js as a webserver. First, Node.js server creates a listening socket. In unix platforms, there is a kernel mechanism called "poll"(I think epoll is used in Node.js). Instead of waiting for a connection to send data, you can just register the connection with poll API and continue with your program. That's what happens in Node.js - it just creates the first listening socket, registers it and drops into a loop. Now, inside the loop, it's calling the poll API again with a very low(may be even 0, not sure) waiting time. Now, if there is any data in a registered socket, this API will notify it. So, you take new data and starts processing it. As for the handling of first socket, it creates a new Socket connection and registers that with poll API too and continues the process normally. Now, when HTTP request data comes to the second socket, the main loop gets notified by the epoll API. So, the main loop just jumps into the HTTP request handling function. This function takes the data available, append it with any old data sent from that socket and checks whether if there is enough data for a complete request. If there is not, it simply saves the data in memory and returns to the main loop. If there is data, it simply computes the response and send it back.

A quite similar happens if the response generation need any disk content or database content. Those functions just say I want access to these resources, and tell me once the data is back - and then the main loop takes over.

So, the main difference is that Async model eliminates the overhead created by thousands of threads/processes and handles many connections in single thread. The trick here is that understanding computing only takes a really small amount of time (comparatively). As I understand it, you can't do heavy processing inside the request handlers(like sort 100,000 numbers using buble sort) - this will halt all other requests. Of course, doing the same on a threaded server will cause severe lag in other requests - but that's because of the heavy CPU usage. In Node.js case, it's simply not even trying to process other requests because it's counting on the request handler to finish up quickly, or return to main loop when it's waiting.


For web development purposes, you can think of Node as Apache and PHP rolled into one: imagine writing the whole application inside httpd.conf, using javascript instead of XML.


I found this




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: