

Ask HN: I really don't understand node.js, could someone explain it to me? - barredo

That's it. I guess I try to understand node.js as a traditional webserver where you set up files in a folder and you are going.<p>Is this the case? How is node.js different from Apache/Nginx/Lighttpd/Cherooke/etc<p>Thanks.
======
asolove
So let's say you write a web app in Rails or Django. Of course all sorts of
magic happens behind the scenes, but ultimately the whole thing boils down to
a function, ( def handle_request(request) { return ... } ); which is given the
request and returns a response. If you have to wait for something (the
database or whatever) that function will have to sit around waiting for it
(e.g. "blocking") before returning the answer.

In Node.js, despite however much magic of a framework you have, it utlimately
boils down to a function that gets a request and a response proxy ( function
process_request(request, response) { } ). The function can do whatever it
wants with this response object, by sending data right away, stashing it
somewhere to be picked up later, having it send data in response to an event
in the future, whatever. Then the function returns, and gets out of the way,
and other stuff runs. But the response object is still sitting around, and
whenever you tell it to do something, you can send some (or all) of a response
back.

~~~
callmeed
This helps a lot (I've been wondering the same thing about node).

But if you're not waiting for database calls (or whatever) before you respond,
what is it useful for?

I'm still a bit confused—some type of example and comparison (say with Rails)
would help

~~~
aaronblohowiak
Most web servers are bottlenecked on Io and memory. Assuming the bottleneck is
that your app servers are starved for more data ( as opposed to trying to push
out too much data to clients,) then a big part of your request cycle is
waiting. Waiting for db, waiting for memcache, waiting for web service,
waiting for disk to return some file, etc. The other thing is that your app
servers will gobble ram. This is less true with copy on write, but ruby
implementations and rails are usually very memory hungry. Node is not. Node is
more like rack than rails, so it isn't fair to compare the two.

------
sandaru1
Following is what I understood while writing a Tornado like server
implementation in PHP. Please correct me if I'm wrong.

\-----

If you consider a normal application/process, the computation done there is
usually very fast. For example, take a blog post being displayed. Once
everything(data) is there, wrapping the content in a template, creating the
HTTP requests only takes few milliseconds. However, disk IO, database IO,
network IO, etc take a quite lot of time compared to computing. The process is
usually idle while waiting for those resources.

Now, consider Apache. It's creating a TCP socket and starts listening. Once a
connection is made by a client, it spawns a new thread for handling this
connection. But since it's waiting for resources like network IO, the thread
stays idle for a while. Now, imagine if you are getting lot of requests - like
1000 requests per second. If finishing one web request(processing+waiting)
takes about 500ms, apache will need 500 threads per second. Now, creating a
thread contains a certain amount overhead. When the thread count goes higher,
it becomes much more slower.

Now, consider Node.js as a webserver. First, Node.js server creates a
listening socket. In unix platforms, there is a kernel mechanism called
"poll"(I think epoll is used in Node.js). Instead of waiting for a connection
to send data, you can just register the connection with poll API and continue
with your program. That's what happens in Node.js - it just creates the first
listening socket, registers it and drops into a loop. Now, inside the loop,
it's calling the poll API again with a very low(may be even 0, not sure)
waiting time. Now, if there is any data in a registered socket, this API will
notify it. So, you take new data and starts processing it. As for the handling
of first socket, it creates a new Socket connection and registers that with
poll API too and continues the process normally. Now, when HTTP request data
comes to the second socket, the main loop gets notified by the epoll API. So,
the main loop just jumps into the HTTP request handling function. This
function takes the data available, append it with any old data sent from that
socket and checks whether if there is enough data for a complete request. If
there is not, it simply saves the data in memory and returns to the main loop.
If there is data, it simply computes the response and send it back.

A quite similar happens if the response generation need any disk content or
database content. Those functions just say I want access to these resources,
and tell me once the data is back - and then the main loop takes over.

So, the main difference is that Async model eliminates the overhead created by
thousands of threads/processes and handles many connections in single thread.
The trick here is that understanding computing only takes a really small
amount of time (comparatively). As I understand it, you can't do heavy
processing inside the request handlers(like sort 100,000 numbers using buble
sort) - this will halt all other requests. Of course, doing the same on a
threaded server will cause severe lag in other requests - but that's because
of the heavy CPU usage. In Node.js case, it's simply not even trying to
process other requests because it's counting on the request handler to finish
up quickly, or return to main loop when it's waiting.

------
cmelbye
No, node.js is not a web server that serves files and folders. It doesn't even
have to be a web server. It's simply a collection of libraries written
specifically to be asyncronous. It's kind of like Ruby and Python, but with
asynchronous libraries and obviously using JavaScript. Asyncronous means non-
blocking, so node.js makes use of things like callbacks to prevent your code
from blocking. The Wikipedia article would be a good resource for further
research.

(This stuff can get pretty complicated and I don't understand everything
fully. Please point out things I got wrong!)

~~~
aaronblohowiak
Just to clarify: it is a set of IO libraries, for socket, http, and file io.
Other people have built things on top of node like db adapters and tempting
languages, but at it's core node is focused on io.

------
gexla
The wikipedia explanation...

Node.js is an evented I/O framework for the V8 JavaScript engine. It is
intended for writing scalable network programs such as web servers.

So, if you need a web server, you need to build it. But as you can see from
the front page of the node.js site, writing a simple web server is... simple.
;)

Otherwise, the easiest way to understand with it is to dig in and play with
it. Heroku has Node hosting in beta (invite only) so you might keep an eye out
for that. Take a look at howtonode.org for some great tutorials to start with.

------
greenlblue
You'll get a better answer on stackoverflow.

~~~
techiferous
I'm not sure why people are downvoting you, because you are absolutely right.
StackOverflow was built for questions like this; but these questions are not
the main reason Hacker News exists.

Pointing the person to a resource that could help him better is valuable.
Upvoted.

------
dreyfiz
Just watch Ryan Dahl's jsconf.eu talk. It's great fun and very informative.

<http://blip.tv/file/3735944>

~~~
briancooley
That one is a lot about Node.js performance. This one is a little better:

<http://jsconf.eu/2009/video_nodejs_by_ryan_dahl.html>

------
mhd
node.js is basically the JavaScript equivalent of Python's Twisted or Ruby's
EventMachine, an asynchronous server toolkit. If the node.js documentation
doesn't explain it well enough, look at its "rivals", maybe that helps (well,
maybe not in the case of Twisted, if I remember correctly).

------
mvalente
When you use Apache/Nginx/etc, you're using a webserver. Usually you will also
use a serverside programming language like Python, Ruby, PHP, etc. The
webserver is usually programmed using C/C++ and the serverside language is
"bolted" on.

With Node.js you have Javascript as a serverside language (like PHP, Python,
etc). But included with it you have a library that allows you to create a
webserver.

Instead of having a webserver and couple a serverside language to it, you have
a serverside language that is able to spawn its own webserver (although you
can choose to use Apache or others).

The biggest advantage though, comes from the fact that JS is a functional
language with a focus on event based programming. And so Node.js follows that
paradigm: you use JS functions that are binded to certain events and run when
one event happens, instead of running continously and doing some kind of loop
waiting on I/O,CPU, database, etc. This allows for better performance.

Plus there's the advantage of being able to program websites using the same
language both client-side and serverside. But thats another matter.

\-- MV

------
DjDarkman
Just watch this: <http://developer.yahoo.com/yui/theater/video.php?v=dahl-
node>

------
barfoomoo
I understand the asynchronous part and the advantages associated with it, but
what baffles me is that at the operating system level how do they work? If it
has to do something asynchronously should it not be mapped to OS threads and
if so how do they have a performance advantage over things like Apache which
make use of threads directly?

~~~
tlrobinson
Two reasons usually given for async IO being more efficient than threaded
servers are a) a large chunk of memory (on the order of a MB) must be
allocated per thread (and thus per client) and b) the overhead of context
switching.

Both of these problems can be solved by things like green threads (which get
scheduled on a single OS thread), coroutines, fibers, etc, though Node people
will claim any sort of "machinery" like that will add unacceptable overhead.
But aren't deeply nested callbacks and their associated closures also a form
of "machinery", albeit one which must be explicitly managed by the programmer?

The other argument against thread-like abstractions is the introduction of
race conditions associated with shared memory. I believe coroutines sort of
solve this, as do shared-nothing / message-passing systems like Erlang's
"processes" and HTML5 Web Workers (and of course OS processes, though they're
too heavyweight).

[just realized the previous 2 paragraphs don't answer your question, sorry]

Anyway, instead of blocking on a single socket operation per thread you can
use select() and other more efficient APIs (poll, epoll, kqueue) to wake a
single main thread when any of the sockets (out of hundreds or thousands) are
ready for reading or writing.

When there is no asynchronous API for some operation (like with many
filesystem operations or existing database clients, for example) a thread pool
is indeed used to make that operation asynchronous to the rest of the
application (however depending on things like the disk cache, context
switching might not be negligible, so Node now offers synchronous filesystem
APIs as well)

~~~
barfoomoo
Thanks. Did not know that async APIs at the OS level existed.

------
udfalkso
I found this post helpful: [http://debuggable.com/posts/understanding-node-
js:4bd98440-4...](http://debuggable.com/posts/understanding-node-
js:4bd98440-45e4-4a9a-8ef7-0f7ecbdd56cb)

------
mean
For web development purposes, you can think of Node as Apache and PHP rolled
into one: imagine writing the whole application inside httpd.conf, using
javascript instead of XML.

------
udfalkso
I found this

