But you say things like this and it worries me. Because a lot of people look up to you and either you said this because you feel defensive about your project or you said it because you genuinely don't understand the cases we're talking about here. And this is a problem because a lot of people look up to you and what you say, so when you say something as baffling as this response, you run the risk of leading a lot of people astray.
I was sort of at a loss for how to reply in the time I have to spare for Hacker News, but thankfully Aphyr did for me. But let me clarify what I said a bit, since I was a bit terse: The problem Node.js has is a social one. A lot of node hackers take the stance, "I thought Node.js solved the problems threads presented," (https://groups.google.com/d/msg/nodejs/eVBOYiI_O_A/kv6iiDyy9...) like there is a single axis of superiority and Node.js sits above the methods that came before. But the reality is that Node.js is really just another possible implementation in the Ruby/Python/Pike/-Perl-¹ space, and shares most of the same characteristics as those languages.
So you have a lot of people who are aces at front-end programming in the browser thinking they have a uniformly superior tool for tackling high-performance server problems, but really they don't have that; they just have a tool with familiar syntax. And so they fearlessly (and perhaps admirably) charge into the breech of platform programming without realizing that the way people scale big projects involves a lot of tools, a lot of thought about failure modes, and a lot of well-established algorithms with very specific tradeoffs.
And so this is Node.js's problem. It's just another gun in the gunfight, but its community thinks its a cannon. In a world where high-performance parallel VMs like Java or Erlang have very powerful and helpful languages like Clojure or Scala on top, we're in a funny situation. It becomes increasingly difficult to justify all these GIL-ridden implementations of languages.
¹ Correction, perl 6 doesn't plan share this behavior. What I read suggests it's not done yet.
I have a lot of problems with Node myself - but the single event loop per process is not one of them. I think that is a good programming model for app developers. I love Go so much (SO MUCH), but I cannot get past the fact that goroutines share memory or that it's statically typed. I love Erlang but I cannot get the past the syntax. I do not like the JVM because it takes too long to startup and has a bad history of XML files and IDE integration - which give me a bad vibe. Maybe you don't care about Erlang's syntax or static typing but this is probably because you're looking at it from the perspective of an engineer trying to find a good way to implement your website today. This is the source of our misunderstanding - I am not an app programmer arguing what the best platform to use for my website--I'm a systems person attempting to make programming better. Syntax and overall vibe are important to me. I want programming computers to be like coloring with crayons and playing with duplo blocks. If my job was keeping Twitter up, of course I'd using a robust technology like the JVM.
Node's problem is that some of its users want to use it for everything? So what? I have no interest in educating people to be well-rounded pragmatic server engineers, that's Tim O'Reilly's job (or maybe it's your job?). I just want to make computers suck less. Node has a large number of newbie programmers. I'm proud of that; I want to make things that lots of people use.
The future of server programming does not have parallel access to shared memory. I am not concerned about serialization overhead for message passing between threads because I do not think it's the bottleneck for real programs.
I cannot understand this hangup about syntax. Syntax is the easiest thing to learn with a new language, you just look it up. It's the semantics and usage where the real problems are.
Erlang doesn't have static typing, it is dynamically typed. It has always been dynamically typed.
And however you look at it doing highly concurrent systems with processes is much easier. And who says that processes imply "parallel access to shared memory"? Quite the opposite actually.
Really the only really tricky thing for a newbie is strings.
 as larry wall famously said, "lisp has all the visual appeal of oatmeal with fingernail clippings mixed in."
I'm not the author of the comment above, but I think Erlang's syntax is effective in that it strongly emphasizes computation by pattern matching. If you write very imperative code in it (as people tend to, coming from Ruby or what have you), yes, it will look gnarly. Good Erlang code looks qualitatively different. There are pretty good examples of hairy, imperative Erlang code being untangled in this blog post: http://gar1t.com/blog/2012/06/10/solving-embarrassingly-obvi...
The . , ; thing is a bit of a hack, admittedly -- I suspect that comes from using Prolog's read function to do parsing for the original versions of Erlang (which was a Prolog DSL), and reading every clause of a function definition at the same time. Prolog ends every top-level clause with a period, not "; ; ; ; .". (Not sure, but a strong hunch, supported by Erlang's history.) I got used to it pretty quickly, though.
It is very simple really, think of sentences in English and it all becomes trivially simple.How many sentences end in a ';'?. And you almost never need to explicitly specify blocks.
Here http://ferd.ca/on-erlang-s-syntax.html are some alternate ways of looking at it.
(Hello, Robert! :) )
Anyways, my experience is that all languages will get people criticizing them. And, in my experience, those kinds of criticisms should almost always be categorized as "does not want to talk about language FOO" and a proper response is probably something like "if you don't want to give that subject the respect it deserves, let's change the subject to something you find interesting".
Syntax benefits are not all about subjectivity. Anyone claiming this has effectively lost the plot and decided to go turtle in the discussion.
There is no getting around the difference in semantics. For this I think that having a different syntax is actually better. Also there are things in Erlang which are hard to fit syntactically into the syntax of OO languages, for example pattern-matching.
I also thought this. For some reason though the more I use it the more I am coming around to its syntax. There are lots of gotchas when compared with other popular languages, but still its syntax has grown on me recently.
At the opposite end of the syntactic spectrum is Lisp. The more I use Clojure the more I am loving it as well.
"The future of server programming does not have parallel access to shared memory."
I agree. The future of server programming is also not spawning multiple child processes. The question I have is how far off is that future? I know that most of my web applications today simply run in multiple spawned OS processes.
Barring a massive shift in hardware architectures, shared access by cooperating threads is your only option for high-performance shared state. Look at core counts vs core flops for the last five years. Look at Intel's push for NUMA architectures. This is a long-scale trend forced by fundamental physical constraints with present architectures, and I don't see it changing any time soon.
Anyone telling you shared state is irrelevant is just pushing the problem onto someone else: e.g., a database.
What about STM in Clojure? It's technically parallel access to shared memory, but the transactional nature obviates the need for mutexes and all the crap that makes shared memory a pain in the ass.
Programming is a verb describing what people do with programming systems, and we're nowhere near the point where it can be done automatically yet. You cannot remove people from the equation and claim to be interacting with it.
It really kinda sucks for them right now.
Serious hard-core engineers that need serious tools are actually pretty well served by current tools. No, no, they're not perfect. But we're way better off than somebody who's just beginning in terms of what tools are aimed at us.
Personally I like static typing, especially if somewhat optional, it's something we effectively do through documentation anyway (via JSdoc or similar), but makes it concrete.
I dont think light-weight threads sharing memory is so bad, symmetric coroutines are more or less the same as an event loop IMO, the thought put into working with them is more or less identical, just without callback hell and odd error-handling, but I suppose going all-out with message passing could be fine. I think that's still a bit of an implementation detail unless you get rid of the concept of a process all together and start just having a sea of routines that talk to each other.
My favorite: Joxa a Clojure inspired (really just inspired ;)) lisp http://joxa.org/
Then there are Elixir (mentioned already) and Reia,
Both seem to be inspired by Ruby.
Erlang has a far superior computational model an implementation than Node, it can handle way more requests faster and is newcomer friendlier as any web request is simply a message received.
Thank you for the nice write up.
What I'd contrast against your view- and I agree with most of your premise, that node is extremely seductive and dangerous and many are apt to get in way way way over their head- is that the comforts you describe are what kill these other languages, what strange and prevent us from becoming better more understanding programmers. Ruby, python, php, less so perl, the webdev that goes on there happens by and large at extreme levels of abstraction: developers flock to the known explored center, the tools with the most, the places that seem safest.
The dangerous dangerous scenario presented by most good web development tools is that it is the tools that know how to run things. Contrary to the charge into the breech throw up ad-hoc platforms in production every day mentality (of node), these (ruby, php, python) platforms stagnate, they fall to the ruin as their tooling strives towards ever reaching greater heights: the tools accrue more and more responsibility, there are better carved out & expected ways to do things, and incidental complexity, the scope of what must be known, how far one has to travel, to get from writing a page to it getting shipped over the wire or executing, balloons.
If anything, Node's core lesson to the world has been about how much is not required. Connect, the only & extremely extremely low-lifed common denominator of Node web world, is the meager-est, tiniest smallest iota of a pluggable middleware system (if only Senchalabs had been courteous enough to be more up front about it being a complete and total rip off Commons-Chain & to not add a thing, I would not bloody loath it). That pattern? bool execute(Context context). Did you handle this request? No? Ok, next. You need to deploy a bunch of processes on a bunch of boxes? You an probably write up something perfectly adequate in a week. Don't have a week? Go find a module: certainly Substack has at least one for whatever your cause (here it's Fleet, https://github.com/substack/fleet).
Node modules are wonderful. They all have some varyingly long list of dependencies, usually the tree is 4-8 different things, but the total amount of code being executed from any given module is almost always short of a couple dozen KB: your engineering team can come in and understand anything in a day or three, and gut it and rebuild it in another day or two. Modules, unlike how development processes have shaped up in hte past decade, are wonderfully delightfully stand-alone: there are no frameworks, no crazy deployment systems, no bloody tooling one is writing to: it's just a couple of functions one can use. The surface area, what is shown, is tiny, is isolated, is understandable, there's no great deep mesh. This runs so contrary to the Drupal, to the Rails, to the Cakes or Faces of the world where one is not writing a language, they're at the eight degree of abstraction writing tools for a library that implements enhancements for a framework that is a piece of an ioc container that runs on a application server that runs on a web server that runs in a runtime that actually does something with the OS.
We need to get more developers willing to charge into the breech and break out a gun fight. This stuff is not that complicated,* and the tools we have are hiding that fact from us more often than not.
So, I admire and love approaches like Vert.x, that take the reactor pattern (what powers Node) and blow it up to the n-th degree, that solve deployment challenges, but at the same time I don't think there is a huge amount of magic there: most node developers have not advanced their runtimes to the level of parity that is called for yet, but this I do not see as a colossal problem. Node, shockingly, even when hideously under tooled, under supported, under op'ed, seems to stand up and not fall over in, in a vast amount of cases. Problems of the rich, good problems to have, when your node system is having worrisome performance problems: most projects will not scale this big, Node will just work, and hopefully you have enough actual genuine talent with enough big picture understanding wtc on your side to not be totally frozen out when your traffic goes up 10x in two days and no one anticipated it. Node is not a land for hand holding, and I don't think that's a bad thing: I think it'll help us listen better to our machines, to not follow our toolings lead into the breech, but to consider what it is we really actually are building for ourselves.
this is the sanest and most pragmatic way server a web server from multiple threads
it's all serialization - but that's not a bottleneck for most web servers.
I disagree, especially for a format like JSON. In fact, every web app server I've dug into spends a significant amount of time on parsing and unparsing responses. You certainly aren't going to be doing computationally expensive tasks in Node, so messaging performance is paramount.
i'd love to hear your context-switching free multicore solution.
I claimed no such thing: only that multiprocess IPC is more expensive. Modulo syscalls, I think your best bet is gonna be n-1 threads with processor affinities taking advantage of cas/memory fence capabilities on modern hardware.
What is this I can't even.
Note that I picked the really small messages here--integers, to give node the best possible serialization advantage.
$ time node cluster.js
Finished with 10000000
$ pidstat -w | grep node
11:47:47 AM 25258 48.22 2.11 node
11:47:47 AM 25260 48.34 1.99 node
Compare that to a multithreaded Clojure program which uses a LinkedTransferQueue--which eats 97% of each core easily. Note that the times here include ~3 seconds of compilation and jvm startup.
$ time lein2 run queue
"Elapsed time: 55696.274802 msecs"
$ pidstat -tw -p 26537
Linux 3.2.0-3-amd64 (azimuth) 07/29/2012 _x86_64_ (2 CPU)
11:52:03 AM TGID TID cswch/s nvcswch/s Command
11:52:03 AM 26537 - 0.00 0.00 java
11:52:03 AM - 26540 0.01 0.00 |__java
11:52:03 AM - 26541 0.01 0.00 |__java
11:52:03 AM - 26544 0.01 0.00 |__java
11:52:03 AM - 26549 0.01 0.00 |__java
11:52:03 AM - 26551 0.01 0.00 |__java
11:52:03 AM - 26552 2.16 4.26 |__java
11:52:03 AM - 26553 2.10 4.33 |__java
$ time lein2 run atom
"Elapsed time: 969.599545 msecs"
$ pidstat -tw -p 26717
Linux 3.2.0-3-amd64 (azimuth) 07/29/2012 _x86_64_ (2 CPU)
11:54:49 AM TGID TID cswch/s nvcswch/s Command
11:54:49 AM 26717 - 0.00 0.00 java
11:54:49 AM - 26720 0.00 0.01 |__java
11:54:49 AM - 26728 0.01 0.00 |__java
11:54:49 AM - 26731 0.00 0.02 |__java
11:54:49 AM - 26732 0.00 0.01 |__java
Also Node starts up in 35ms and doesn't require all those parentheses - both of which are waaaay more important.
A great example of this is resque. It'd be great if we could have multiple resque workers per process per job type. This would save a ton of resources and greatly improve processing for very expensive classes of jobs. It's a very real-world consideration. But instead, because our architecture follows this "share nothing in my code, pass that buck to someone else" model like a religion, we waste a lot of computing resources and lose opportunities for better reliability.
What I find most confusing about this argument is that I challenge you to find me a website written in your share nothing architecture that, at the end of the day, isn't basically a bunch of CRUD and UX chrome around a big system that does in-process parallelism for performance and consistency considerations. Postgresql, MySQL, Zookeeper, Redis, Riak, DynamoDB ... all these things are where the actual heavy lifting gets done.
Given how pivotal these things are to modern websites, it's bizarre for you to suggest it is not something to consider.
In addition to that, processes severely inhibit the usefulness of in-process caches. Where threads would allow a single VM to have a large in-process cache, processes generally prevent such collaboration and mean you can only have multiple, duplicated, smaller in-process caches. (Yes, you could use SysV shared memory, but that's also fraught with issues)
The same goes for any type of service you would like to run inside a particular web server that could otherwise be shared among multiple threads.
Web serving is OK & all, but I'd love if node could be an ideal runtime for petri-nets and webworker meshes too.