This purist evented I/O fundamentalism has to stop.
While evented I/O is great for a certain class of problems: building network servers that move bits around in memory and across network pipes at both ends of a logic sandwich, it is a totally asinine way to write most logic. I'd rather deal with threading's POTENTIAL shared mutable state bullshit than have to write every single piece of code that interacts with anything outside of my process in async form.
In node, you're only really saved from this if you don't have to talk to any other processes and you can keep all of your state in memory and never have to write it to disk.
Further, threads are still needed to scale out across cores. What the hell do these people plan on doing when CPUs are 32 or 64 core? Don't say fork(), because until there are cross-process heaps for V8 (aka never), that only works for problems that fit well into the message-passing model.
redis is pretty badass for talking to other processes without really talking to other processes, and it's the #10 most depended-upon library on npm right now. http://search.npmjs.org/
It won't work for every problem, of course.
dnode is a good way to easily talk to other node.js processes without the HTTP overhead. It can talk over HTTP too, with socket.io.
node-http-proxy is useful as a load balancer, and a load balancer can distribute work between cores.
Finally, most of the node.js people I've met, online and offline, are polyglots, and are happy to pick a good tool for a job. But right now node.js has great libraries for realtime apps, the ability to share code on the client and server in a simple way, and good UI DSLs like jade, less, and stylus.
Huh? how does any of this keep me from having to write callback spaghetti? If I send a call using redis or dnode or whatever, I have to wait for it, so that means a callback.
I feel you about the polyglot and tend to agree, but I think some people are really trying to force awkward things into node, like people attempting to write big full-stack webapps using it.
I didn't mention any of those libraries in my reply to your original comment. To help alleviate callback noise, there's https://github.com/caolan/kanso , which Fenn mentions in his blog post's comments, and there are EventEmitters, and there is Function.bind() and Backbone.js-style async, where you have methods that describe what each callback does. (Backbone is usable in Node.js but perhaps not as practical on the server as it is on the client.)
Also callbacks to me are kind of like parenthesis in Lisp. They're annoying but they're for the greater good. :)
"deal with it" is a good response. personally, I'd rather write callbacks than deal with threads any day. And there are plenty of approaches to make dealing with callbacks easier (async.js is one of many).
Actually, "deal with it" is fucking lame, isn't a response at all, and is a good example of why I use the term "fundamentalism."
Is your problem with threads or shared mutable state? Web applications should be stateless and can be written as long request-response pipelines on-top of a pool of actor threads, with the only shared state existing at either ends of the pipeline, probably hidden by a framework anyways.
The "deal with it" is the same for threads - if you're using a system with threads you have to deal with them and it's generally not too fun. Node is what it is. It's callback based. I didn't mean the "deal with it" in a hostile way, more like that's the kind of system node is so deal with it. Is node great for every kind of problem? No. Definitely not. Actors are great too (and they've usually got threads in them, but they are well contained). But for certain kinds of apps node and the careful use of callbacks work out great. I guess I was trying to counter all of the hating on callbacks - they're not so awful when you get used to them.
I'll definitely admit that many of the inventive techniques that node.js users have come up with make dealing with callbacks less absurd, but the problem is that it's just polishing shit. It's extra hoops to jump through with no benefit, outside of the C10K fapfest.
Would all the developers writing apps on node.js who are doing 10,000+ concurrent requests per process please stand up?
Only the minority gives a shit how many Hello World requests your stack can pump out. Some of us use node.js because it's a) got some decent libraries, b) makes realtime easy, c) uses the same code server+client-side reducing context-switches, and d) gets quite enjoyable once you know what you're doing.
You obviously haven't written a lot of async code so not sure why you're so against the idea.
So again, to you, what benefit does everything being built on this async, single-threaded, event-driven model give you? Sounds like not much. You could get all of those benefits + much cleaner code using threading or fibers or actors, but that does not make for C10K badassness, so here we are, with our callbacks.
Also, I'd say I've written enough code on top of node.js to be qualified to comment on this. Here's some of it that's open source:
I get simple scalability, server-client code reuse, easy realtime, and, despite what you or others may think, I like Javascript as a language. Node.JS was also a perfect fit for the scraping framework I wrote: https://github.com/chriso/node.io
IMO async code isn't as difficult or ugly as you make it out to be. Is async code as easy to write and follow as sync code? No. Is it worth the benefits I've mentioned? For me, yes.
> Is it worth the benefits I've mentioned? For me, yes.
This sounds like a tacit admittance that you're willing to deal with it because you don't think you have other options, but you do. There is at least one CPS compiler for Node (TameJS), and there are other languages that allow for the same result, but more straightforward implementations of concurrent code (Erlang, Ocaml/LWT, Haskell). I'm not saying you should use though, but we can do better and we should, even if it's just compiling back to JS in the end.
If I can run one app with 10,000+ concurrent requests on one server, then I can run a hundred apps with a hundred concurrent requests each on that same server. You can say that doesn't matter either, but the cost of hosting a web app just dropped 99%.
And now the cost of hosting a hundred webapps is just a rounding error in contrast to the cost of developing one. Isn't it nice living in the future?
The additional benefit is that I can take the same program and handle 20,000+ concurrent users on two servers— which is when I suddenly become very glad that my hardware costs are significant compared to my dev costs.
And now the cost of hosting a hundred webapps is just a rounding error in contrast to the cost of developing one.
That's a weird way to look at it, unless you're in the webapp hosting business? For everyone else there is usually only one webapp that they care about.
I can take the same program and handle 20,000+ concurrent users on two servers
Sorry to break it, but that's not how it works. Unless you have one of those rare webapps that never need to touch a database.
Really? Show of hands, now, who here cares about one (and only one) webapp?
Anyway, what's good for the webapp hosting business is good for web developers, and what's good for web developers is good for the technical ecosystem in general (and then the world). Of course going from VPSes to EC2s was a significant improvement. But that isn't as good as it gets. EC2 rates were cheap already, but when Az started the free tier it represented a significantly lower barrier to entry. That's good for everyone.
And seriously, come on. This is a way of making programs run faster, and not a little faster, but a hundred times faster. It's the very definition of technological progress. It's absurd that we're here arguing about whether it matters or not.
This is a way of making programs run faster, and not a little faster, but a hundred times faster.
Sorry, but if anything then that statement is absurd.
Faster than what? And where's that "hundred times faster" figure coming from?
It seems there's a bit of a misconception about the bottlenecks and cost structure in real world web applications.
Rails (aka the slowest web framework known to man) is popular because it trades hardware for development velocity. Hardware is cheap, developer salaries are not.
It's cheap until it's not. At a certain point, you just can't process more requests at once in Rails. That's your limit. And it's not much— 100, maybe.
But node multiplies that, a lot. Which is nice, because you know it won't break or slow down if a bunch of people use it for some reason. And so you don't have to re-architect your system for a while longer, which is valuable time.
Yes. Rails is measured in hundreds per second. Node in thousands per second.
The point that you still seem to be missing is that the monetary amounts involved have normally turned into a rounding error long before you reach a traffic-volume where this difference becomes relevant.
Or, in other words, hosting a "webapp" already is nearly free in terms of hardware.
I have an server monitoring application that does this, currently I've tested on a vm with 4000+ clients with 1 server (all on the same vm), without any errors (CPU load spikes, but memory usage was actually low, I believe ~200MB for the entire test).
4000+ requests being sent every second to the server in total, the server is a single node process.
This is a pretty common pattern for any work you have to do asynchronously, pretty much all libraries should be implementing this for you so the first 3 lines should be all you code
getSomething("id", function(thething) {
// one true code path
});
function getSomething(id, callback) {
var myThing = synchronousCache.get("id:3244");
if(myThing) {
callback(null, myThing);
} else {
async(id, callback);
}
}
a minor quibble with language style isnt exactly what I would call "A Giant Step Backwards"
Amen to that. The power of Node is this language style already makes tons of sense to experienced front-end developers who deal with async flow for everything they do.
I was under the impression that you could not do _anything_ synchronous? What if the call blocks for 100ms? or 1000ms? Won't that delay all other clients and all other requests?
It's kind of a dick move to make a inflammatory, flamebait blog post and then later admit you didn't actually know what you were talking about. Maybe you should have actually understood the technology you were working with before making knee-jerk snap judgements.
It's not an inflammatory blog post. The headline might be a little annoying to some, but not to most of us. There are negative headlines about every programming language platform.
Also it's a reminder for Node.js developers to make good use of async patterns, lest their code look silly. Besides async, which Fenn mentions in the comments, there's EventEmitters and Backbone.js for doing different styles of async programming. And there are a few other libraries that are a lot like https://github.com/caolan/async .
Sure, everyone has to start somewhere, and to be perfectly honest, I ran into many of the same issues.
The difference is in what happened afterward. What I did then was to try to understand why Node.js did things differently and how I could accomplish my goals in an idiomatic way. I didn't try to shoe-horn in my existing mental framework for how things should work, and then throw my hands in the air when they didn't.
However, what I didn't do was immediately run to Alert the Internets about what "A Giant Step Backward" Node.js is.
In my experience Node.js is more difficult than synchronous code. But it's also, by far, the easiest way to get something running that's massively parallel.
I recently wrote a project that needs to do 100's or 1000's of possibly slow network requests per second. The first try was Ruby threads. That was a disaster (as I should have predicted). I had an entire 8-core server swamped and wasn't getting near the performance I needed.
The next try was node. I got it running and the performance was fantastic. A couple orders of magnitude faster than the Ruby solution and a tenth of the load on the box. But, all those callbacks just didn't sit right. Finding the source of an exception was a pain and control flow was tricky to get right. So, I started porting to other systems to try to find something better. I tried Java (Akka), EventMachine with/without fibers, and a couple others (not Erlang though).
I could never get anything else close to the performance of Node. They all had the same problems I have with Node (mainly that if something breaks, the entire app just hangs and you never know what happened), but they were way more complicated, _harder_ to debug, and slower.
I have a new appreciation for Node now. And now that I'm much more used to it, it's still difficult to do some of the more crazy async things, but I enjoy it a lot more. It's a bit of work, and you have to architect things carefully to avoid getting indented all the way to the 80-char margin on your editor, but you get a lot for that work.
> In my experience Node.js is more difficult than synchronous code. But it's also, by far, the easiest way to get something running that's massively parallel.
It's asynchronous, not actually parallel. Only a single CPU core will be used in node.js.
However, waiting asynchronous tasks will let other tasks run meanwhile, which can feel like parallelism.
Have you tried Haskell? Why haven't you tried Erlang? They're both good choices for writing applications that handle making and receiving thousands of concurrents requests because they have very fast lightweight threads and good exception handling and you don't need to write the kind of code that Node.js forces you to write.
Node is appealing because Javascript is ubiquitous, but I doubt it's harder to learn Erlang than it is to learn to write everything in nested callbacks.
That's why I mentioned that I hadn't tried Erlang. It was one of my first choices, but the library support wasn't there. Tolerant HTML parsing is a requirement for the project, and there doesn't seem to be any good parsers for Erlang.
> I recently wrote a project that needs to do 100's or 1000's of possibly slow network requests per second. The first try was Ruby threads. That was a disaster (as I should have predicted). I had an entire 8-core server swamped and wasn't getting near the performance I needed.
I don't mean to be offensive, but welcome to at least the 1980s. We've known this doesn't scale for ages. The fact that you even tried it and thought it might be a viable solution just shows your education has failed you. I am highly biased against Node, I think it is a giant step backwards. Every blog post I have read that says the opposite admits they have no experience in anything else so they just default to Node being good. I only hope Node is a fad.
> The fact that you even tried it and thought it might be a viable solution just shows your education has failed you
This holier-than-thou attitude is exactly the thing that prevents more people from becoming educated on these kind of subjects. Knowledge and experience on these kinds of subjects are _not_ trivial and are _not_ easy to obtain! Information about what scales, what does not, and why, are scattered all over the place and difficult to find. It may be very obvious to you after you already know it but it's really not. If, instead of spending so much time on declaring other people as dumb or uneducated, people would spend more time on educating other people, then the world would be much better off.
If I told you that the Sun revolved around the earth, would you consider it holier-than-thou to tell me my education has failed me? The knowledge that the original solution (threads in Ruby for handling large numbers of concurrent connections) does not work is not a secret. Even with a peripheral following of Ruby it is well known that this solution would not work, not only is the base Ruby implementation slow but threads have been a known issue in it from day one. The solution is so misaligned with the problem that it is really difficult to validly argue this knowledge is too difficult to find. Yes, scaling to Twitter is another issue, but we're talking about the basics here.
But don't put words in my mouth, I didn't call anyone dumb, I said his education has failed him. This could be himself failing to properly research the problem space, it could be his school for not properly introducing him to the subject, it could be a whole host of things. I never argued he was incapable of learning (clearly he did). And I do spend a lot of time educating people, don't take a singular snapshot of a comment on HN as indication of how my entire life is spent.
If 99% of the world doesn't know that the Sun revolved around the earth, and knowledge of that fact is scattered across a handful of monks in obscure monasteries, then yes that would indeed be a holier-than-thou attitude. And that's exactly what's going on with scaling knowledge.
I would even argue that the Ruby threading problems he's experiencing may not necessarily because Ruby threads don't scale, but possibly because he's using them wrong or because he's not using the right version of Ruby. Ruby 1.8 uses select() to schedule I/O between threads so the more threads and the more sockets you have, the slower things become because select() is linear time. The use of select() also results in a hard limit of about 1024 file descriptors per Ruby 1.8 process. Also, context switching in Ruby 1.8 requires copying the stack. Ruby 1.9 is much better in this regard since it uses native threads and no longer uses select() to schedule threads that are blocked on I/O. I'm running a multithreaded, multiprocess Ruby (1.8!) analytics daemon that generates 12 GB of data per day. It flies. VMWare CloudFoundry's router is written in Ruby + EventMachine. That thing has to process tons and tons of requests and they've found Ruby + EventMachine to be fast enough. To simply say "Ruby doesn't scale and is slow" is too simplistic, and ignoring the underlying more complex reasons would result in one bumping against the same problems in a different context. So no, it isn't so obvious from day 1 that using Ruby would be a problem.
If you think scaling to a few thousand concurrent connections is some closely guarded monk secret then you are making my point all the stronger: our education is failing. The C10k document was released in 1999. That is 12 years we've known how to handle 10k concurrent connections (bare in mind this is not the exact use case mentioned above). Twisted came out a similar time ago. Libevent dates back to at least 2005. Erlang first appeared in 1986, and Erlang was just abstracting how people were already solving concurrency issues into a language. This stuff is not hidden or elite or privileged information. How is it that your average HNer can spout off reasons why NoSQL is claimed to be superior to an RDBMS for scalability yet doesn't seem to know the fundamental concepts? Our education is indeed failing the current generation of programmers if they don't know the basics for a topic that the internet is flooded with.
Finally, I didn't say "ruby is too slow and doesn't scale", I pointed out that the various issues with doing things fast in Ruby have been known for a long time, even to someone only following Ruby. What I did say was that the basic approach the original commentor chose is known to not scale (which it didn't). This is a fundamentally different approach than the VMWare product you mentioned which has chosen a solution similar to Twisted. This approach is known to scale far superior to the original solution.
I've read C10K years ago. I don't consider it a useful educational document. It doesn't go deep enough into the subject and it doesn't describe well enough why things like threading don't work well. I all had to find that out through experience (of writing a web server myself), and even now some things are still blurry. C10K is very old, it doesn't describe recent advances in threading implementations. Why are threads unscalable? Is it because of context switching overhead? Because of the VM addresses required for the thread stacks? Because of frequent CPU cache flushes? C10K doesn't go into those kind of details and it doesn't look like many people know the answer. I suppose I can find out by studying the kernel but frankly I have better things to do with my time, like actually writing software and making a living.
Furthermore C10K is not the complete picture. It describes only connection management, not what you actually do with the connection. The latter plays a non-trivial role in actual scalability.
Nobody said C10K is the end-all-be-all on scaling, nor should it be. It is also false that it doesn't present any arguments for why threads don't scale well. According to the change log the latest change was 2011 which added information on nginx, but the last change prior to that was 2006. That version of the document talks about memory limitations on using threads because of the stack size. While not extensive it also includes a quote from Drepper about context switches. It even points to documents that are pro-threading. My point is that it's been around for 12 years and is easily found and read by anyone. It also contains a plethora if links to more information on the subject. To claim that C10K is not a useful educational document is absurd. But C10K is one document, there are numerous ones out there, a google search away, our knowledge of scaling has anything but stagnated in the last 12 years. Your options aren't C10K or read the kernel source for your favorite OS.
If you're going to argue that those with knowledge need to distribute that knowledge better, that's fine. Knowledge can almost always be distributed better, perhaps someone could make a nice centralized website that has better information than highscalability.com. But at the same time you've just told me that a document that is a great introductory resource on scaling connection handling is not a "useful educational document". You may have better things to do with your time than read kernel source, but is your time so precious you can't do some google searches? Perhaps read an industrial white paper or academic paper on the subject of scalability? You can write all the software you want but if you're ignorant of how to overcome scalability problems are you accomplishing much? And if you're doing tests and learning about what scales but keeping it to yourself you are just as culpable of not educating people.
Known that what doesn't scale? Threads? Sure they do. I could have made Ruby threads work, I guarantee you. Move the network requests into a giant thread pool, and then do everything else synchronously. It was the context switching that was killing me, but you can use threads in such a way that that's not a problem. It's just a lot of work. More work than just using Node, which pretty much does exactly that for me for free.
And on a side note "your education has failed you"? Seriously? You can't just preface something with "I don't mean to be offensive" and then say whatever you like. I don't mean to be offensive, but get yourself some social skills.
I don't understand why saying your education has failed you is considered offensive. In the US we say that about high schoolers all the time but it's a knock on the education system not the students.
Citing your sources would be a great way of asserting your credibility. "Every blog post I have read that says the opposite" doesn't count.
I'd love to read some well-thought-out arguments against node from people who've seriously given node a shot, but I haven't seen any. Granted, I haven't been looking for them, so please prove us wrong.
I don't log the numerous NodeJS blogs I have come across over the year, my opinions on if the authors of Node blogs are qualified isn't really relevant to my point though: NodeJS is a step backwards.
I wrote a comment on reddit that expands on my reasons more, although the second point is less of a problem if people use something like TameJS with Node (which I don't think most people are doing).
It seems a bit cruel that he mentions "horror stories" about Twisted; most of the culture shock people complain about with Twisted is exactly the kind of flow-control shenanigans that he describes in Node.js. In fact, Twisted makes those particular examples easier.
To handle branching flow-control like 'if' statements, Twisted gives you the Deferred object[1], which is basically a data structure that represents what your call stack would look like in a synchronous environment. For example, his example would look something like this, with a hypothetical JS port:
d = asynchronousCache.get("id:3244"); // returns a Deferred
d.addCallback(function (result) {
if (result == null) {
return asynchronousDB.query("SELECT * from something WHERE id = 3244");
} else {
return result;
}
});
d.addCallback(function (result) {
// Do various stuff with myThing here
});
Not quite as elegant as the original synchronous version, but much tidier than banging raw callbacks together - and more composable. Deferred also has a .addErrback() method that corresponds to try/catch in synchronous code, so asynchronous error-handling is just as easy.
For the second issue raised, about asynchronous behaviour in loops, Twisted supplies the DeferredList - if you give it a list (an Array, in JS) of Deferreds, it will call your callback function when all of them have either produced a result or raised an exception - and give you the results in the same order as the original list you passed in.
It is a source of endless frustration to me that despite Twisted having an excellent abstraction for dealing with asynchronous control-flow (one that would be even better with JavaScript's ability to support multi-statement lambda functions), JavaScript frameworks generally continue to struggle along with raw callbacks. Even the frameworks that do support some kind of Deferred or Promise object generally miss some of the finer details. For example, jQuery's Deferred is inferior to Twisted's Deferred: http://article.gmane.org/gmane.comp.python.twisted/22891
The differences between your example and the common JavaScript practice for promises (when they're used; most of the time they aren't) are that then is used instead of addCallback and that chaining is available and taken advantage of.
Ah, yes. Twisted's Deferreds do support that kind of chaining, but I didn't use it in my original snippet because I didn't want to have an example of a Deferred where no Deferreds were actually visible. :)
In my own code, I tend not to use chaining because "methods returning self" is not a common idiom in Python (although tools like jQuery have given it currency in the JS world) and because I haven't yet figured out a way of formatting a multi-line method invocation that doesn't look messy.
For starters since it event based nothing can happen until you give back control to event loop. Secondly if you add another callback or errback to a Deferred that has already been fired it just calls it with the last result. Note, Deferred is much simpler than I imagine you are thinking it is. It does nothing to handle events it is just a first class way to represent the flow of code.
Nice article. Be sure to read the comments, as the author links to a library that makes the second example easy to rewrite in a short and elegant way. https://github.com/caolan/async#forEach
Also the first example, the cache hitting and missing, could be rewritten with async, too.
async.waterfall([
function(callback) {
asynchronousCache.get("id:3244", callback);
},
function(myThing, callback) {
if (myThing == null) {
asynchronousDB.query("SELECT * from something WHERE id = 3244", callback)
} else {
callback(myThing)
}
},
function(myThing, callback) {
// We now have a thing from the DB or cache, do something with result
// ...
}
]);
From a readability standpoint I'll take the "old" version any day:
function getFromDB(foo) {
var result = asynchronousCache.get("id:3244");
if ( null == result ) {
result = asynchronousDB.query("SELECT * from something WHERE id = 3244");
}
return result;
}
There's a die-hard core of callback proponents (especially in twisted- and lately in node-land) who claim the pure callback-style is more predictable, robust and testable.
This is not my experience. I've been through that with twisted (heavily), some with EventMachine and some with node.js.
The range of use-cases where I'd benefit from that style was extremely narrow.
For most tasks it would turn into a tedium of keeping track of callbacks and errbacks, littering supposedly linear code-paths with a ridiculous number of branches, and constantly working against test-frameworks that well covered the easy 90% but then fell down on the interesting 10% (i.e. verifying the interaction between multiple requests or callback-paths).
I'm sticking to coroutines where possible now (eventlet/concurrence) and remain baffled over the node-crew's resistance against adding meaningful abstractions to the core.
I like javascript a lot (more so with coffee), but I see little benefit in dealing with the spaghetti when that doesn't even give me transparent multi-process or multi-machine scalability.
And to prevent the obligatory: Yes, I know about Step, dnode and the likes. They remain kludges as long as the default style (i.e. the way all libraries and higher level frameworks are written) is callback-bolognese.
It's possible to have the best of both worlds by using node-fibers (https://github.com/laverdet/node-fibers) and mixing synchronous and asynchronous styles as appropriate.
I believe that JavaScript could become the dominant language on the server. We just need to have a set of consistent synchronous interfaces across the major server side JavaScript platforms. This would allow for innovation and code reuse higher up the stack.
I'm doing my bit by maintaining Common Node (https://github.com/olegp/common-node), which is a synchronous CommonJS compatibility layer for Node.js.
But, um, concurrently means "at the same time," too. Merriam's first definition is actually "running parallel."
Wouldn't it be better to describe it as running serially, using non-blocking asynchronous function calls? Guess that doesn't really roll of the tongue, though.
In computer science the two terms are generally distinguished, while you're free to redefine them commonly when people say concurrency they refer to a model of performing things at the same time but may not actually be simultaneous. Parallelism is generally the actual act of it running simultaneously.
When you write an async. server in C++, where you can't inline functions, you write functions like OnRead(), OnWrite(), etc. Once you get used to it the whole thing ends up fairly easy to read and understand. Eg.
function handler(yes, no) {
return function (err, data) {
if (data) {
yes(err, data);
}
else {
no(err, data);
}
}
}
function get() {
function done(err, data) {
// do something with data
}
function db() {
asynchronousDb.query("SELECT * fomr something where id = 3244", done);
}
asynchronousCache.get("id:3244", handler(done, db));
}
Though async event driven programming is somehow confusing in the beginning, there are some idioms that could make your code more comprehensible.
My experience (mostly in perl - EV,AnyEvent, etc.) is that combining evens with finite state machines gives more structured code, with smaller functions that interact in predefined manner.
The V8 team is thinking of adding yield/defer support to make programming in Node neater. There's hope yet.
Meanwhile there are other choices that are about as easy, like Python libraries and Google's Go. Too bad they don't have the same zealous community support.
> The V8 team is thinking of adding yield/defer support to make programming in Node neater. There's hope yet.
There is SpiderNode, not sure what the status of it is, but it replaces V8 in node.js with SpiderMonkey. SpiderMonkey already has yield and much other new JS syntactic sugar.
Mentions they're working closely with the node team here. And that whole talk is about fixing up JavaScript into a modern language, remove the weird syntax quirks around classes, modules, etc. Say what you mean instead of the weird closure soup.
While I am also drawn to NodeJS, I wonder if it wouldn't make more sense to use a language that supports coroutines. Not sure which ones would apply - probably Racket, as they seem to do everything?
If you're interested in keeping up to date with the project I describe below, please follow me on twitter @NirvanaCore.
I had many of the same concerns with node.js. Every time I attempted to wrap my head around how I'd write the code I needed to write, it seemed like node was making it more complicated. Since I learned erlang several years ago, and first started thinking about parallel programming a couple decades ago, this seemed backwards to me. Why do event driven programming, when erlang is tried and true and battle tested?
The reason is, there isn't something like node.js for erlang, and so I set out to fix that.
For about a year I've been thinking about design, and for a couple months I've been implementing a new web application platform that I'm calling Nirvana. (Sorry if that sounds pretentious. It's my personal name- I've been storing up over a decades worth of requirements for my "ideal" web framework.)
Nirvana is made up of an embarrassingly small amount of code. It allows you to build web apps and services in coffeescript (or javascript) and have them execute in parallel in erlang, without having to worry too much about the issues of parallel programming.
It makes use of some great open source projects (which do all the heavy lifting): Webmachine, erlang_js and Riak. I plan to ship it with some appropriate server side javascript and coffee script libraries built in.
Some advantages of this approach: (from my perspective)
1) Your code lives in Riak. This means rather than deploying your app to a fleet of servers, you push your changes to a database.
2) All of the I/O actions your code might do are handled in parallel. For instance, to render a page, you might need to pull several records from the database, and then based on them, generate a couple map/reduce queries, and then maybe process the results from the queries, and finally you want to render the results in a template. The record fetches happen in parallel automagically in erlang, as do the map/reduce queries, and components defined for your page (such as client js files, or css files you want to include) are fetched in parallel as well.
3) We've adopted Riak's "No Operations Department" approach to scalability. That is to say, every node of Nirvana is identical, running the same software stack. To add capacity, you simply spin up a new node. All of your applications are immediately ready to be hosted on that node, because they live in the database.
4) Caching is built in, you don't have to worry about it. It is pretty slick- or I think it will be pretty slick-- because Basho did all the heavy lifting already in Riak. We use a Riak in-memory backend, recently accessed data is stored in RAM on one of the nodes. This means each machine you add to your cluster increases the total amount of cache RAM available.
5) There's a rudimentary sessions system built in, and built in authentication and user accounts seem eminently doable, though not at first release. Also templating, though use any js you want if you don't like the default.
So, say, you're writing a blog. You write a couple handlers, one for reading an article, one for getting a list of articles and one for writing an article. You tie them to /, /blog/article-id, and /post. For each of these handlers, any session information is present in the context of your code.
To get the list of articles, you just run the query, format the results as you like with your template preference and emit the html. If it is a common query, you just set a "freshness" on it, and it will be cached for that long. (EG: IF you post new articles once a week, you could set the freshness to an hour and it would pull results from the cache, only doing the actual query once an hour.)
To display a particular article, run a query for the article id from the URL (which is extracted for you) and, again this can be cached. For posting, you can check the session to see if the person is authorized, or the header (using cookies) and push the text into a new record, or update an existing record. Basically this is like most other frameworks, only your queries are handled in parallel.
The goal is to allow rapid development of apps, easy code re-use, and easy, built-in scalability, without having to think much about scalability, or have an ops department.
This is the very first time I've publicly talked about the project. I think that I'm doing something genuinely new, and genuinely worth doing, but its possible I've overlooked something important, or otherwise embarrassed myself. I don't mean to hijack this thread, but felt that I needed to out my project sometime. A real announcement will come when I ship.
If you're interested in keeping up to date with the project I describe above, please follow me on twitter @NirvanaCore.
EDIT TO ADD:
-- This uses Riak as the database with data persisted to disk in BitCask. The Caching is done by a parallel backend in Riak (Riak supports multiple simultaneous backends) which lives in RAM. So, the RAM works as a cache but the data is persisted to disk.
You know what I'm going to say, right? This is HN, we have but one great commandment, which we repeat like parrots!
Ship it tomorrow! ;)
Yes, you have overlooked something important, there will be something to be embarrassed by -- whether it turns up next week or next decade -- and we'll all have a good laugh. Don't sweat it. And don't worry that the thing isn't finished; the kind of geeks who might sign on at this stage like unfinished things; that is why they can't resist reinventing the wheel. Plus, it doesn't have to be finished to give people ideas, which is half the point. You are ready to start spreading the news; your writeup says as much.
The public repository beckons!
(Frankly, this sounds like a great experiment, although I would never be too quick to predict the end of the ops department. ;)
I agree completely. I just need to close the loop, so that people who use it are inspired to add to it, rather than frustrated trying to understand it. Once I can provide a trivial example of using it that makes sense, I'll ship. I can promise you I'm not holding out for some ideal!
I also shouldn't predict the end of the ops department, until I've had it running in production with a significant number of users.
I think it would be better to say- my goal is to have the ops department working on really interesting stuff, rather than shepherding a fleet of servers, every one of which has a different configuration.
Regarding 1), it is a great way to do things (never copy files around), but just make sure you have good release tools built around the db records. The approach that we use is that when things are published they automatically have a 'dev' release record for that version for the user that published it (so they can test) and individual versions have to be promoted ultimately to 'prod' for 'everyone'. The version you see is basically the most recent version released to your context (basically you and/or what machine you are on). The key is having great tools built around this process before something blows up and there is an emergency.
This is going to be an opportunity for the community. :-)
I have some plans in this area, but I couldn't guess how to best fit into other people's workflows.
What might be nice is if there was a way to sync a git repository with Riak, and then Nirvana could just pull the relevant code from that. Seems like it would be the best solution, but looking into that- from looking at possibly integrating with a github API (do they have one?) to command line scripts is something I'm punting on to focus on the essentials.
Again, essentially what we do in concept. The dev side is a separate db (git could represent this), and once something is published past dev the record is copied to the production db (Riak in your case). The dbs have in-memory local machine cached versions, for speed, but also for reliability. You don't want your N machines depending on one db point of failure, so the local cached copies exist to mitigate that. (They receive updates via a multicast mechanism.)
That looks like an interesting thing, but it is a something for node.js, not a "node.js for erlang".
By which, I meant to say "platform for building server applications in javascript, backed by the power of the erlang OTP platform."
Node.js gives server side javascript a platform, that's great. What I'm working on is giving server side coffeescript and javascript access to the erlang platform (and some really great erlang technologies.)
If you're not feeling too proprietary, hit me up on twitter. Might be some areas where we can collaborate. Either way, good luck with your project as well. I tried to not do this myself, I really did. :-)
I don't use Twitter much, but I'll definitely take you up on that when I have something a little more developed. I'm coming at this from the exact opposite direction as you (what I'm doing is closer to porting Erlang to Node than vice versa), but we're obviously thinking along the same lines— I'd love to compare notes.
I think it's a job for the jquery team, a node.js for the rest of us. Once they get jquery mobile out the door it would seem to be the most obvious next project.
node.js is node.js for the rest of us :) jQuery smooths out browser inconsistencies and replaces an overly verbose API (DOM). node.js has neither of these issues.
Huh? I don't follow -- do you mean that jQuery is teaching people to use anonymous functions everywhere and that's sometimes similar to how some people code with node?
While evented I/O is great for a certain class of problems: building network servers that move bits around in memory and across network pipes at both ends of a logic sandwich, it is a totally asinine way to write most logic. I'd rather deal with threading's POTENTIAL shared mutable state bullshit than have to write every single piece of code that interacts with anything outside of my process in async form.
In node, you're only really saved from this if you don't have to talk to any other processes and you can keep all of your state in memory and never have to write it to disk.
Further, threads are still needed to scale out across cores. What the hell do these people plan on doing when CPUs are 32 or 64 core? Don't say fork(), because until there are cross-process heaps for V8 (aka never), that only works for problems that fit well into the message-passing model.