Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: On Rob Pike's Concurrency is not Parallelism?
51 points by lazydon on July 28, 2012 | hide | past | favorite | 70 comments
This is regarding slides by Rob Pike with above title. Every time I go thru this I feel like a moron. I'm not able to to figure out the gist of it. It's well understood that concurrency is decomposition of a complex problem into smaller components. If you cannot correctly divide something into smaller parts, it's hard to solve it using concurrency.

But there isn't much detail in slides on how to get parallelism once you've achieved concurrency. In the Lesson slide (num 52), he says Concurrency - "Maybe Even Parallel". But the question is - When and How can concurrency correctly and efficiently lead to Parallelism?

My guess is that, under the hood Rob's pointing out that developers should work at the level of concurrency - and parallelism should be language's/vm's concern (gomaxprocs?). Just care about intelligent decomposition into smaller units, concerned only about correct concurrency - parallelism will be take care by the "system".

Please shed some light.

Slides: http://concur.rspace.googlecode.com/hg/talk/concur.html#title-slide HN Discussion: http://news.ycombinator.com/item?id=3837147

I believe the concept says to focus more on task-based parallelism rather than data-based parallelism.

In Go, it is easy to create multiple tasks/workers each with a different job. This is implicitly parallelizable - each task can (but doesn't have to) work within their own thread. The only time when the workers can't run in parallel is when they are waiting on communication from another worker or outside process.

This is opposed to data level parallelism where each thread is doing the nearly exactly same instructions on different input, with little to no communication between the threads. An example would be to increase the blue level on each pixel in an image. Each pixel can be operated on individually and be performed in parallel.

So - the push is for more task-based parallelism in programs. It is very flexible in that it can run actually in parallel or sequentially and it won't matter on the outcome of the program.

I don't agree with that; task parallelism is easier than data parallelism in an actor/CSP-based system, but both have their place. Take something like x264 -- task parallelism will not help it, unless you're encoding multiple videos at one time. But data parallelism (SIMD, in particular) is the reason it's the fastest encoder around.

In general, task paralellism can model anything data parallelism can and given "a sufficiently smart compiler" you can end up with the same result. This is an informal corollary of Needham's Duality (which is itself informal, so make of it what you will).

Our current hardware tends to offer great data parallelism for homogenous task queues and task parallelism for heterogenous task queues. Given that, we task parallelism needs a lot more consideration from a human. It's also the case that our current implementations of data parallelism tend to focus on shared memory computations, so their scope is a lot more limited than the distributed-system-conflated discipline of task-oriented concurrency, where we're currently having an explosion of engineering.

For a "sufficiently smart compiler" you mean autovectorization. That's a hard problem.

I've mentioned before that I don't think SIMD is going away anytime soon. It has so many upsides (cache locality, simple implementation in hardware due to the single-instruction nature of it) that I think designs that don't take advantage of it will always be at a disadvantage for the foreseeable future.

x264 gets equal use from threads per-frame as it does SIMD per-pixel. There's a pretty much linear speed increase for each new thread, even.

Interesting, I didn't know that video encoding could be parallelized per-frame in that way. That's really cool; I stand corrected.

Still, I think it's clear that an x264 with N threads per frame with no per-pixel SIMD would lose to the current x264 with N threads per frame. The key is that x264 is making good use of task parallelism and data parallelism.

I think his point is that although people are often interested in parallel behavior, they should really focus more on concurrent design to avoid/remove the dependencies that ultimately limit parallelism. Slide 19 mentions automatic parallelization, but his point is that developers should think more about concurrency and not that Go will automatically maximally parallelize concurrent programs.

Yes. For the underlying system to be able to detect and run in parallel you need the concurrency. Without concurrency your application will be run sequentially.

Concurrency is more than decomposition, and more subtle than "different pieces running simultaneously." It's actually about causality.

Two operations are concurrent if they have no causal dependency between them.

That's it, really. f(a) and g(b) are concurrent so long as a does not depend on g and b does not depend on f. If you've seen special relativity before, think of "concurrency" as meaning "spacelike"--events which can share no information with each other save a common past.

The concurrency invariant allows a compiler/interpreter/cpu/etc to make certain transformations of a program. For instance, it can take code like

    x = f(a)
    y = g(b)
and generate

    y = g(b)
    x = f(a)
... perhaps because b becomes available before a does. Both programs will produce identical functional results. Side effects like IO and queue operations could strictly speaking be said to violate concurrency, but in practice these kinds of reorderings are considered to be acceptable. Some compilers can use concurrency invariants to parallelize operations on a single chip by taking advantage of, say, SIMD instructions or vector operations:

    x = f(a)   y = g(b)
Or more often:

    [x1, x2, x3, x4] = [f(a1), f(a2), f(a3), f(a4)]
where f could be something like "multiply by 2".

Concurrency allows for cooperative-multitasking optimizations. Unix processes are typically concurrent with each other, allowing the kernel to schedule them freely on the CPU. It also allows thread, CPU, and machine-level parallelism: executing non-dependent instructions in multiple places at the same wall-clock time.

      CPU1        CPU2
    x = f(a)    y = g(b)
In practice, languages provide a range of constructs for implicit and explicit concurrency (with the aim of parallelism), ranging from compiler optimizations that turn for loops into vector instructions, push matrix operations onto the GPU and so on; to things like Thread.new, Erlang processes, coroutines, futures, agents, actors, distributed mapreduce, etc. Many times the language and kernel cooperate to give you different kinds of parallelism for the same logical concurrency: say, executing four threads out of 16 simultaneously because that's how many CPUs you have.

What does this mean in practice? It means that the fewer causal dependencies between parts of your program, the more freely you, the library, the language, and the CPU can rearrange instructions to improve throughput, latency, etc. If you build your program out of small components that have well-described inputs and outputs, control the use of mutable shared variables, and use the right synchronization primitives for the job (shared memory, compare-and-set, concurrent collections, message queues, STM, etc.), your code can go faster.

Hope this helps. :)

I don't think he's saying that you shouldn't concern yourself at all with parallelism; only that you should focus on concurrency first and that will lead to easier parallelism. And he I think he is saying that decomposition and concurrency helps non-parallel programs stay simple and easy to understand. The benefits of concurrency are greater than just parallelism.

I think that is about what you are saying.

Obviously both terms get used in a variety of overlapping ways. Without looking at how the terms are used in the slides you refer to, I think the proper definitions are:

Concurrency is a property of a program's semantics, usually seen in a 'thread' abstraction. The most important part of concurrency is nondeterminism. Concurrency might permit parallelism depending on hardware, language runtime, OS, etc.

Parallelism is a property of program execution and means multiple operations happening at once, in order to speed up execution. A program written to take advantage of parallelism can be deterministic, but often is accomplished by way of concurrency in OS threads. Because most languages still suck.

The difference between Concurrency and Parallel is in my opinion somewhat subjective, depending on how you view the problem.

Examples of Concurrency:

1. I surf the web And I run an installer for another program.

2. One gopher brings empty carts back, while another brings full carts to the incinerator.

The idea of concurrency is that two completely separate tasks are being done at the same time. There may be synchronization points between the two tasks, but the tasks themselves are dissimilar.

Viewed in one way moving empty wheelbarrows may be completely different from moving filled ones.

Viewed in another way, they might seem very similar.

Concurrency has to do with task parallelism.

Parallel has to do with data parallelism.

There's a gray line between the two where you can't clearly differentiate between them.

IMO, they solve two different goals. Sometimes these goals overlap, but not always. Please someone correct me if I'm wrong, but this is how I see it:

The goal of concurrency is to model a problem that is easier or better or more natural to model concurrently (that is, different parts are running simultaneously). For example, if you are simulating agents in some virtual world (eg a game), then it may make sense that these agents are being modeled in a way that they are all running simultaneously and the processing of one does not block the processing another. This could be done by timeslicing available processing between each agent (either by using the processor/OS pre-emptive multitasking, if available, or through cooperative multitasking[1]), or is could be done by physically running multiple agents at the same time on different processors or cores or hardware threads (parallelism). The main point is that concurrency may be parallel, but does not have to be and the reason you want concurrency is because it is a good way to model the problem.

The goal of parallelism is to increase performance by running multiple bits of code in parallel, at the same time. Concurrency only calls for the illusion of parallelism, but parallelism calls for real actual multiple things running at the exact same time and so you must have multiple processors or cores or computers or whatever hardware resources for parallelism, while concurrency can be simulated on single core systems. Parallel code is concurrent code, but concurrent code is not necessarily parallel code.

Distributed programming is parallel programming where the code is running in parallel, but distributed over multiple computers (possibly over the internet at distant locations) instead of running locally on one multi-core machine or a HPC cluster.

From stackoverflow[2]:

    Quoting Sun's Multithreaded Programming Guide:
    Parallelism: A condition that arises when at least two threads are executing simultaneously.
    Concurrency: A condition that exists when at least two threads are making progress. A more generalized form of parallelism that can include time-slicing as a form of virtual parallelism.
As for when can concurrency be turned into parallelism, that depends. Assuming that the hardware resources exist (or that it simply falls back to time-sliced concurrency if they do not), parallelism can be achieved if there are multiple things that can execute independently. There are at least three types of parallelism and if your problem, code and/or data fit one of these, then your concurrent code may be executed in parallel.

1. You have a number of items of data and each one can be processed independently and at the same time. This is the classic embarrassingly parallel data parallelism. For example, you have a large array of input data and an algorithm that needs to run on each one, but they do not interact. Calculating pixel colours on your screen, or handling HTTP requests, for example.

2. You have two or more independent tasks doing different things that are run in parallel. For example, you have one thread handling the GUI and another thread handling audio. Both need to run at the same time, but both run independent of each other with minimal communication (which can happen over a queue, perhaps).

3. Sometimes you have a stream of data which must be processed by a number of tasks one after the other. Each task can be run in parallel so that if you have a stream of data, item[0], item[1], item[2], etc (where 0 is first in the stream, 1 is next and so on) and a number of tasks that need to run in order: A, B, C - then you can run A, B and C in parallel such that A processes item[0] while B and C are idle, then B processes item[0] an A processes item[1] and C is idle. Then C processes item[0] while B processes item[1] and A processes item[2] and so on. This is called pipelining and as you probably know is a very common technique inside processors.

Of course, all three can be combined.

[1] Could be a coroutine which is yielded or simply by executing some kind of update function which, by contract, must not block

[2] http://stackoverflow.com/questions/1050222/concurrency-vs-pa...

Take a look at lthread:


lthread supports concurrency and parallelism using pthreads. Each lthread scheduler runs its own lthreads concurrently, or better said, one at a time. But from an observer's perspective they look like they are running in parallel.

Now if you create 2 pthreads on a 2 core machine and each runs an lthread scheduler then you have true parallelism because you can have 2 lthreads running in parallel at the same time. One by each scheduler.

I feel this is a closer context to what Rob is discussing than what I found in the comments here.

Just noticed the license changed to BSD! I had not realised - will definitely need to give it another look as I may be able to make use of it now. Awesome.

Yup I changed it couple months ago :)

Parallelism is when you run your program on multiple processors. Semantics of your program does not change whether you run it on single processor or multiple processors.

Concurrency is when you write your program using multiple threads. Your program looks and means vastly different if you use threads.

You use concurrency not for performance gain, but for clarity of your program. You use parallelism for performance gain, to utilize all your processors.

That can't be right. Threads don't improve the clarity of most programs; they're notoriously unclear.

I agree with you, but not in the way you intended. I suspect that most programming language implementations of concurrency are bolt-ons and are a bit broken in terms of their human (programmer) interface design.

On the contrary, I think proper concurrency constructs, including threads, do improve the clarity of programs. Part of the problem in reasoning about threads is a lack of useful primitives. Multithreaded programming in Java? For me, at least, it's tough. In Erlang? Trivial. In Clojure, if you're willing to deal with the slowness of the STM, it can be beautifully simple.

Erlang and Clojure don't have threads in the common meaning of the term (http://en.wikipedia.org/wiki/Thread_(computing)). They are reactions against programming with threads.

You can expand the term "thread" to mean "concurrency in general" but even then it isn't true that the main purpose of writing concurrent code is clarity. When people say "look at this concurrent program I wrote" they rarely [1] say "look at how well the code expresses the problem". What they overwhelmingly say is "look at this benchmark".

[1] Joe Armstrong talks about how Erlang lets you represent processes more like they happen in the real world. But that's a niche view. Most people think about concurrency as a platform issue, where the platform is multicore hardware or distributed systems, and otherwise wouldn't bother with it.

Clojure absolutely has threads, in the common meaning of the term. I use futures all over the place in my code, and it's quite idiomatic.

(future (do (foo 1) (bar 2)) Runs the expression on another thread. Futures intentionally compose well with the built-in clojure concurrency tools.

Clojure has no dislike of threads. It scorns locks, and code that is safe in one thread, but unsafe in multi-threaded situations.

Ah, thanks for the correction and teaching me something.

Erlang and Clojure are multithreaded. Clojure offers full access to Java-style (Thread. (fn [] ...)), j.u.c Executors, and Clojure-specific threadpools for agents and futures, including differentiation between CPU and IO intensive threads. Erlang's scheduler is also threaded, though its threads aren't 1:1 with pthreads. That's fairly common: plenty of libraries and languages map green threads to physical threads for highly concurrent programs.

Also, concurrency provides some performance gains that are not related to parallelism, such as efficient use of the CPU while waiting on I/O.

I would love to see a video of the actual talk. I have searched for it multiple times but I couldn't find anything.

We are still waiting on the production company that Heroku used to deliver the goods, believe it or not.

I thought he was just giving an excuse for the abysmal multi-core scaling of idiomatic Go programs.

Go scales quite well across multiple cores iff you decompose the problem in a way that's amenable to Go's strategy. Same with Erlang.

No one is making "excuses". It's important to understand these problems. Not understanding concurrency, parallelism, their relationship, and Amdahl's Law is what has Node.js in such trouble right now.

Trouble? Node.js has linear speedup over multiple cores for web servers. See http://nodejs.org/docs/v0.8.4/api/cluster.html for more info.

You know, ryah; I like your work ethic, I like your enthusiasm, I think you're a cool guy and it's great your project has so much traction.

But you say things like this and it worries me. Because a lot of people look up to you and either you said this because you feel defensive about your project or you said it because you genuinely don't understand the cases we're talking about here. And this is a problem because a lot of people look up to you and what you say, so when you say something as baffling as this response, you run the risk of leading a lot of people astray.

I was sort of at a loss for how to reply in the time I have to spare for Hacker News, but thankfully Aphyr did for me. But let me clarify what I said a bit, since I was a bit terse: The problem Node.js has is a social one. A lot of node hackers take the stance, "I thought Node.js solved the problems threads presented," (https://groups.google.com/d/msg/nodejs/eVBOYiI_O_A/kv6iiDyy9...) like there is a single axis of superiority and Node.js sits above the methods that came before. But the reality is that Node.js is really just another possible implementation in the Ruby/Python/Pike/-Perl-¹ space, and shares most of the same characteristics as those languages.

So you have a lot of people who are aces at front-end programming in the browser thinking they have a uniformly superior tool for tackling high-performance server problems, but really they don't have that; they just have a tool with familiar syntax. And so they fearlessly (and perhaps admirably) charge into the breech of platform programming without realizing that the way people scale big projects involves a lot of tools, a lot of thought about failure modes, and a lot of well-established algorithms with very specific tradeoffs.

And so this is Node.js's problem. It's just another gun in the gunfight, but its community thinks its a cannon. In a world where high-performance parallel VMs like Java or Erlang have very powerful and helpful languages like Clojure or Scala on top, we're in a funny situation. It becomes increasingly difficult to justify all these GIL-ridden implementations of languages.

Which is not to say these implementations don't have their place (and Node.js is hardly the first javascript implementation outside of a browser), but increasingly they are losing their place in the pieces of your code expected to shuffle data around efficiently in the backend of modern distributed applications.

¹ Correction, perl 6 doesn't plan share this behavior. What I read suggests it's not done yet.

Node is popular because it allows normal people to do high concurrency servers. It's not the fastest or leanest or even very well put together - but it makes good trade offs in terms of cognitive overhead, simplicity of implementation, and performance.

I have a lot of problems with Node myself - but the single event loop per process is not one of them. I think that is a good programming model for app developers. I love Go so much (SO MUCH), but I cannot get past the fact that goroutines share memory or that it's statically typed. I love Erlang but I cannot get the past the syntax. I do not like the JVM because it takes too long to startup and has a bad history of XML files and IDE integration - which give me a bad vibe. Maybe you don't care about Erlang's syntax or static typing but this is probably because you're looking at it from the perspective of an engineer trying to find a good way to implement your website today. This is the source of our misunderstanding - I am not an app programmer arguing what the best platform to use for my website--I'm a systems person attempting to make programming better. Syntax and overall vibe are important to me. I want programming computers to be like coloring with crayons and playing with duplo blocks. If my job was keeping Twitter up, of course I'd using a robust technology like the JVM.

Node's problem is that some of its users want to use it for everything? So what? I have no interest in educating people to be well-rounded pragmatic server engineers, that's Tim O'Reilly's job (or maybe it's your job?). I just want to make computers suck less. Node has a large number of newbie programmers. I'm proud of that; I want to make things that lots of people use.

The future of server programming does not have parallel access to shared memory. I am not concerned about serialization overhead for message passing between threads because I do not think it's the bottleneck for real programs.

"I love Erlang but I cannot get the past the syntax."

I cannot understand this hangup about syntax. Syntax is the easiest thing to learn with a new language, you just look it up. It's the semantics and usage where the real problems are.

Erlang doesn't have static typing, it is dynamically typed. It has always been dynamically typed.

And however you look at it doing highly concurrent systems with processes is much easier. And who says that processes imply "parallel access to shared memory"? Quite the opposite actually.

syntax is the user interface of the language. it is one of the parts you have to deal with on a constant basis, both writing and reading code. i can well see how an unpleasant syntax can be a constant, grating annoyance when dealing with a language. (i happen to really like erlang's syntax, but it definitely has very different aesthetics from javascript's)

I think it would be dangerous if Erlang's syntax were to resemble that of another language like Java or C. Especially for beginners. That is because if things look the same you expect them to behave the same, and Erlang's semantics is very different from those languages whose syntax people think it should be like.

There is no getting around the difference in semantics. For this I think that having a different syntax is actually better. Also there are things in Erlang which are hard to fit syntactically into the syntax of OO languages, for example pattern-matching.

Honestly? Erlang's syntax is not that bad. It's not great in that it has prolog legacy with uncanny valleys to C-legacy left and right, but it's effective.

Really the only really tricky thing for a newbie is strings.

these things are very much a matter of taste. what does "effective" mean with respect to syntax? for instance, lispers will argue that their syntax is effective because of the things it lets you do, but that's orthogonal to whether it's actually a pleasant syntax to program in, which comes down to personal taste[1]. likewise, i love mlish syntax, but opa, a language steeped in ml semantics, nonetheless switched to something more javascripty for their main syntax because it proved more popular[2]. and read through the wrangling over ruby versus python sometime - the two languages are very similar under the hood, but their respective syntaxes are one of the things proponents of each language complain about when they have to use the other.

[1] as larry wall famously said, "lisp has all the visual appeal of oatmeal with fingernail clippings mixed in."

[2] http://blog.opalang.org/2012/02/opa-090-new-syntax.html

It takes a lot of chutzpah for Larry Wall to say anything critical about Lisp syntax. Larry Wall. Come on.

I'm not the author of the comment above, but I think Erlang's syntax is effective in that it strongly emphasizes computation by pattern matching. If you write very imperative code in it (as people tend to, coming from Ruby or what have you), yes, it will look gnarly. Good Erlang code looks qualitatively different. There are pretty good examples of hairy, imperative Erlang code being untangled in this blog post: http://gar1t.com/blog/2012/06/10/solving-embarrassingly-obvi...

The . , ; thing is a bit of a hack, admittedly -- I suspect that comes from using Prolog's read function to do parsing for the original versions of Erlang (which was a Prolog DSL), and reading every clause of a function definition at the same time. Prolog ends every top-level clause with a period, not "; ; ; ; .". (Not sure, but a strong hunch, supported by Erlang's history.) I got used to it pretty quickly, though.

As in Prolog ',' and ';' are separators: ',' behaves like an and, first do then and then do this (as in Prolog); while ';' is an or, do this clause or do this clause (again as in Prolog). '.' ends something, in this case a function definition. Erlang's functions clauses are not the same as Prolog's clauses which explains the difference.

It is very simple really, think of sentences in English and it all becomes trivially simple.How many sentences end in a ';'?. And you almost never need to explicitly specify blocks.

Here http://ferd.ca/on-erlang-s-syntax.html are some alternate ways of looking at it.

Indeed. It makes sense to me. Are my intuitions about the origins of ;s separating top-level clauses accurate?

(Hello, Robert! :) )

Sort of. The syntax evolved at the same time we were moving from Prolog onto our own implementation, which forced us to write our own parser and not rely on the original Prolog one. The biggest syntax change came around 1991, since then it has been mainly smaller additions and adjustments.

To be fair, that was Larry Wall reacting to criticism in 1994 https://groups.google.com/forum/?fromgroups#!msg/comp.lang.l...

Anyways, my experience is that all languages will get people criticizing them. And, in my experience, those kinds of criticisms should almost always be categorized as "does not want to talk about language FOO" and a proper response is probably something like "if you don't want to give that subject the respect it deserves, let's change the subject to something you find interesting".

Erlang syntax maybe jarring but one benefit of it is there is actually very little syntax relative to languages like Java, C++, Python. There isn't that much to hold in your head. I can switch between Erlang and another language pretty effortlessly because the context switch is so small.

Hi. This mentality you have? I disagree.

Syntax benefits are not all about subjectivity. Anyone claiming this has effectively lost the plot and decided to go turtle in the discussion.

FWIW I believe "Maybe you don't care about Erlang's syntax or static typing" was referring to his earlier mention of Go as statically typed. A comma would have improved the clarity :-)

"I love Erlang but I cannot get the past the syntax."

I also thought this. For some reason though the more I use it the more I am coming around to its syntax. There are lots of gotchas when compared with other popular languages, but still its syntax has grown on me recently.

At the opposite end of the syntactic spectrum is Lisp. The more I use Clojure the more I am loving it as well.

"The future of server programming does not have parallel access to shared memory."

I agree. The future of server programming is also not spawning multiple child processes. The question I have is how far off is that future? I know that most of my web applications today simply run in multiple spawned OS processes.

I agree. The future of server programming is also not spawning multiple child processes. The question I have is how far off is that future? I know that most of my web applications today simply run in multiple spawned OS processes.

Barring a massive shift in hardware architectures, shared access by cooperating threads is your only option for high-performance shared state. Look at core counts vs core flops for the last five years. Look at Intel's push for NUMA architectures. This is a long-scale trend forced by fundamental physical constraints with present architectures, and I don't see it changing any time soon.

Anyone telling you shared state is irrelevant is just pushing the problem onto someone else: e.g., a database.

"The future of server programming does not have parallel access to shared memory."

What about STM in Clojure? It's technically parallel access to shared memory, but the transactional nature obviates the need for mutexes and all the crap that makes shared memory a pain in the ass.

Erlang's syntax is one of those things that's just too hard to get past. Along with the other obstacles, it makes Erlang nearly impossible to attain!


Beware of "Kenneth the Swede"! Just stay on the straight and narrow and you will reach the Celestial City.

Wait, so, you wrote off the JVM because of XML and IDEs? Really?

There is a ton of Languages for erlangs BEAM,

My favorite: Joxa a Clojure inspired (really just inspired ;)) lisp http://joxa.org/

Then there are Elixir (mentioned already) and Reia, Both seem to be inspired by Ruby. http://reia-lang.org/ http://elixir-lang.org/

Erlang has a far superior computational model an implementation than Node, it can handle way more requests faster and is newcomer friendlier as any web request is simply a message received.

I do not see how you can simultaneously care about making programming better but not care about how people use what you're making to solve that problem.

Programming is a verb describing what people do with programming systems, and we're nowhere near the point where it can be done automatically yet. You cannot remove people from the equation and claim to be interacting with it.

I think he's saying he wants to make programming better for newbies.

It really kinda sucks for them right now.

Serious hard-core engineers that need serious tools are actually pretty well served by current tools. No, no, they're not perfect. But we're way better off than somebody who's just beginning in terms of what tools are aimed at us.

Agreed, and as computing grows messaging will just get cheaper and cheaper. zeromq is a great example of that, 5 million messages per second over TCP on a macbook air is not half bad.

Personally I like static typing, especially if somewhat optional, it's something we effectively do through documentation anyway (via JSdoc or similar), but makes it concrete.

I dont think light-weight threads sharing memory is so bad, symmetric coroutines are more or less the same as an event loop IMO, the thought put into working with them is more or less identical, just without callback hell and odd error-handling, but I suppose going all-out with message passing could be fine. I think that's still a bit of an implementation detail unless you get rid of the concept of a process all together and start just having a sea of routines that talk to each other.

I think the reaction from many long-term programmers who have switched technologies, careers, frameworks, languages is to the "...use it for everything?" mantra. I applaud your goals and the success at getting more people to program and build things. That's what we really do need. Let's hope this audience is reading HN with an open mind and figure out, as many here have, that the problem to be solved is the most important decision, not the tool.

Thank you for the nice write up.

If you like Erlang but cannot get past its syntax you might want to give Elixir a try.


Syntax looks very Ruby like.

Is that good or bad? I personally don't mind Erlang syntax at all or the syntax overhead to you have to write a gen_server that does nothing. Elixir saves you some of that overhead. It's kind of like CoffeeScript for Erlang. It's fully compatible since it compiles to Erlang AST which compiles to BEAM. You can do "everything" with Elixir you can do with Erlang. Of course, you have to understand how to work with Erlang/OTP to actually benefit.

I admire Vert.x for trying to bring deployability to the separate-non-shared-event-loop world (aka: where Node is): Vert.x has "verticles," which are instantiated multiple times but share no data. It's very similar to node's cluster execution, except Vert.x is a thorough answer to the deployment problem (going as so far as to bake in Hazelcast if you want to scale out from one multi-threaded process to multiple likely-cross-machine processes).

Yet Node itself can not and should not solve the deployment problem: node is a javascript runtime, and contrary to earlier claims I'd declare not opinionated, not one to make this decision for us. The scaling out story is indeed not easy: even tasks like validating session credentials need to be mastered by the team (persist creds into a data-store, or use group communication/pubsub: building for Node is a lot like building for multi-machine Java). The level of DIY-it-ness here are indeed colossal.

What I'd contrast against your view- and I agree with most of your premise, that node is extremely seductive and dangerous and many are apt to get in way way way over their head- is that the comforts you describe are what kill these other languages, what strange and prevent us from becoming better more understanding programmers. Ruby, python, php, less so perl, the webdev that goes on there happens by and large at extreme levels of abstraction: developers flock to the known explored center, the tools with the most, the places that seem safest.

The dangerous dangerous scenario presented by most good web development tools is that it is the tools that know how to run things. Contrary to the charge into the breech throw up ad-hoc platforms in production every day mentality (of node), these (ruby, php, python) platforms stagnate, they fall to the ruin as their tooling strives towards ever reaching greater heights: the tools accrue more and more responsibility, there are better carved out & expected ways to do things, and incidental complexity, the scope of what must be known, how far one has to travel, to get from writing a page to it getting shipped over the wire or executing, balloons.

If anything, Node's core lesson to the world has been about how much is not required. Connect, the only & extremely extremely low-lifed common denominator of Node web world, is the meager-est, tiniest smallest iota of a pluggable middleware system (if only Senchalabs had been courteous enough to be more up front about it being a complete and total rip off Commons-Chain & to not add a thing, I would not bloody loath it). That pattern? bool execute(Context context). Did you handle this request? No? Ok, next. You need to deploy a bunch of processes on a bunch of boxes? You an probably write up something perfectly adequate in a week. Don't have a week? Go find a module: certainly Substack has at least one for whatever your cause (here it's Fleet, https://github.com/substack/fleet).

Node modules are wonderful. They all have some varyingly long list of dependencies, usually the tree is 4-8 different things, but the total amount of code being executed from any given module is almost always short of a couple dozen KB: your engineering team can come in and understand anything in a day or three, and gut it and rebuild it in another day or two. Modules, unlike how development processes have shaped up in hte past decade, are wonderfully delightfully stand-alone: there are no frameworks, no crazy deployment systems, no bloody tooling one is writing to: it's just a couple of functions one can use. The surface area, what is shown, is tiny, is isolated, is understandable, there's no great deep mesh. This runs so contrary to the Drupal, to the Rails, to the Cakes or Faces of the world where one is not writing a language, they're at the eight degree of abstraction writing tools for a library that implements enhancements for a framework that is a piece of an ioc container that runs on a application server that runs on a web server that runs in a runtime that actually does something with the OS.

We need to get more developers willing to charge into the breech and break out a gun fight. This stuff is not that complicated,* and the tools we have are hiding that fact from us more often than not.

So, I admire and love approaches like Vert.x, that take the reactor pattern (what powers Node) and blow it up to the n-th degree, that solve deployment challenges, but at the same time I don't think there is a huge amount of magic there: most node developers have not advanced their runtimes to the level of parity that is called for yet, but this I do not see as a colossal problem. Node, shockingly, even when hideously under tooled, under supported, under op'ed, seems to stand up and not fall over in, in a vast amount of cases. Problems of the rich, good problems to have, when your node system is having worrisome performance problems: most projects will not scale this big, Node will just work, and hopefully you have enough actual genuine talent with enough big picture understanding wtc on your side to not be totally frozen out when your traffic goes up 10x in two days and no one anticipated it. Node is not a land for hand holding, and I don't think that's a bad thing: I think it'll help us listen better to our machines, to not follow our toolings lead into the breech, but to consider what it is we really actually are building for ourselves.

It's parallel in the same sense that any POSIX program is: Node pays a higher cost than real parallel VMs in serialization across IPC boundaries, not being able to take advantage of atomic CPU operations on shared data structures, etc. At least it did last time I looked. Maybe they're doing some shm-style magic/semaphore stuff now. Still going to pay the context switch cost.

it's all serialization - but that's not a bottleneck for most web servers. i'd love to hear your context-switching free multicore solution.

this is the sanest and most pragmatic way server a web server from multiple threads

Threads and processes both require a context switch, but on posix systems the thread switch is considerably less expensive. Why? Mainly because the process switch involves changing the VM address space, which means a TLB shootdown: all that hard-earned cache has to be fetched from DRAM again. You also pay a higher cost in synchronization: every message shared between processes requires crossing the kernel boundary. So not only do you have a higher memory use for shared structures and higher CPU costs for serialization, but more cache churn and context switching.

it's all serialization - but that's not a bottleneck for most web servers.

I disagree, especially for a format like JSON. In fact, every web app server I've dug into spends a significant amount of time on parsing and unparsing responses. You certainly aren't going to be doing computationally expensive tasks in Node, so messaging performance is paramount.

i'd love to hear your context-switching free multicore solution.

I claimed no such thing: only that multiprocess IPC is more expensive. Modulo syscalls, I think your best bet is gonna be n-1 threads with processor affinities taking advantage of cas/memory fence capabilities on modern hardware.

this is the sanest and most pragmatic way server a web server from multiple threads

What is this I can't even.

Note--think I'm wrong about these types of process switches requiring a TLB shootdown. It think it's just cache invalidation.

Don't believe me? Try it:

Node.js: https://gist.github.com/3200829

Clojure: https://gist.github.com/3200862

Note that I picked the really small messages here--integers, to give node the best possible serialization advantage.

    $ time node cluster.js 
    Finished with 10000000

    real 3m30.652s
    user 3m17.180s
    sys	 1m16.113s
Note the high sys time: that's IPC. Node also uses only 75% of each core. Why?

    $ pidstat -w | grep node
    11:47:47 AM     25258     48.22      2.11  node
    11:47:47 AM     25260     48.34      1.99  node
96 context switches per second.

Compare that to a multithreaded Clojure program which uses a LinkedTransferQueue--which eats 97% of each core easily. Note that the times here include ~3 seconds of compilation and jvm startup.

    $ time lein2 run queue
    "Elapsed time: 55696.274802 msecs"

    real 0m58.540s
    user 1m16.733s
    sys	 0m6.436s
Why is this version over 3 times faster? Partly because it requires only 4 context switches per second.

    $ pidstat -tw -p 26537
    Linux 3.2.0-3-amd64 (azimuth) 	07/29/2012 	_x86_64_	(2 CPU)
    11:52:03 AM      TGID       TID   cswch/s nvcswch/s  Command
    11:52:03 AM     26537         -      0.00      0.00  java
    11:52:03 AM         -     26540      0.01      0.00  |__java
    11:52:03 AM         -     26541      0.01      0.00  |__java
    11:52:03 AM         -     26544      0.01      0.00  |__java
    11:52:03 AM         -     26549      0.01      0.00  |__java
    11:52:03 AM         -     26551      0.01      0.00  |__java
    11:52:03 AM         -     26552      2.16      4.26  |__java
    11:52:03 AM         -     26553      2.10      4.33  |__java
And queues are WAY slower than compare-and-set, which involves basically no context switching:

    $ time lein2 run atom
    "Elapsed time: 969.599545 msecs"

    real 0m3.925s
    user 0m5.944s
    sys	 0m0.252s

    $ pidstat -tw -p 26717
    Linux 3.2.0-3-amd64 (azimuth) 	07/29/2012 	_x86_64_	(2 CPU)

    11:54:49 AM      TGID       TID   cswch/s nvcswch/s  Command
    11:54:49 AM     26717         -      0.00      0.00  java
    11:54:49 AM         -     26720      0.00      0.01  |__java
    11:54:49 AM         -     26728      0.01      0.00  |__java
    11:54:49 AM         -     26731      0.00      0.02  |__java
    11:54:49 AM         -     26732      0.00      0.01  |__java
TL;DR: node.js IPC is not a replacement for a real parallel VM. It allows you to solve a particular class of parallel problems (namely, those which require relatively infrequent communication) on multiple cores, but shared state is basically impossible and message passing is slow. It's a suitable tool for problems which are largely independent and where you can defer the problem of shared state to some other component, e.g. a database. Node is great for stateless web heads, but is in no way a high-performance parallel environment.

Yep, serialized message passing between threads is slower - you didn't have to go through all that work. But it doesn't matter because that's not the bottleneck for real websites.

Also Node starts up in 35ms and doesn't require all those parentheses - both of which are waaaay more important.

It is definitely a bottleneck I face daily in designing crowdflower. Our EC2 bill is much higher than it could be because we have to rely so much on process-level forking.

A great example of this is resque. It'd be great if we could have multiple resque workers per process per job type. This would save a ton of resources and greatly improve processing for very expensive classes of jobs. It's a very real-world consideration. But instead, because our architecture follows this "share nothing in my code, pass that buck to someone else" model like a religion, we waste a lot of computing resources and lose opportunities for better reliability.

What I find most confusing about this argument is that I challenge you to find me a website written in your share nothing architecture that, at the end of the day, isn't basically a bunch of CRUD and UX chrome around a big system that does in-process parallelism for performance and consistency considerations. Postgresql, MySQL, Zookeeper, Redis, Riak, DynamoDB ... all these things are where the actual heavy lifting gets done.

Given how pivotal these things are to modern websites, it's bizarre for you to suggest it is not something to consider.

It's more than that. Several processes with small, independently garbage collected heaps are not as efficient as a single process with a large heap, parallel threads, and a modern concurrent GC (e.g. the JVM's ConcurrentMarkSweep GC)

In addition to that, processes severely inhibit the usefulness of in-process caches. Where threads would allow a single VM to have a large in-process cache, processes generally prevent such collaboration and mean you can only have multiple, duplicated, smaller in-process caches. (Yes, you could use SysV shared memory, but that's also fraught with issues)

The same goes for any type of service you would like to run inside a particular web server that could otherwise be shared among multiple threads.

I prefer someone keep rolling for sanity loss & resume the work on isolates!

Web serving is OK & all, but I'd love if node could be an ideal runtime for petri-nets and webworker meshes too.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact