Hacker News new | comments | show | ask | jobs | submit login

This is actually the opposite pattern: instead of pretending everything is a local function even over the network (which turned out to be a bad idea), what if we did it the other way around? Pretend your components are communicating over a network even when they aren't. This is made possible by very efficient lightweight threads (goroutines in go, gevent in Python etc.) and message-oriented patterns popularized by Go channels (and Erlang/OTP before it).

Now your application always checks for IO errors, and the underlying "plumbing" is exposed and available for the developer to tweak at will: timeouts, caching, failover, fan-in, fan-out, etc. become programmable components just like the rest of your app.

TLDR: this is not born-again RPC. It's the anti-RPC.




Pretend your components are communicating over a network even when they aren't. This is made possible by very efficient lightweight threads (goroutines in go, gevent in Python etc.) and message-oriented patterns popularized by Go channels (and Erlang/OTP before it).

It's good you mentioned Erlang. What you're describing >is< Erlang/OTP. The language is structured such that you're always somewhat paying a price for fault tolerant, concurrency safe, and parallelizable distribution. It turns out that this makes for a nifty functional programming language and, surprise surprise, the result is great for fault tolerant, concurrency safe, and parallelizable distribution.


Pretend your components are communicating over a network even when they aren't

Now your application always checks for IO errors, and the underlying "plumbing" is exposed and available for the developer to tweak at will: timeouts, caching, failover, fan-in, fan-out, etc

The big problem is that the bandwidth and latency numbers are vastly different over an actual network vs between processes or OS threads on a single machine vs between green threads within a process.

The problem is that sometimes you require high performance, to a degree that is not possible across an actual network link. And if your coding style doesn't distinguish between an actual network link and an imaginary in-process link (or worse, deliberately makes them indistinguishable and silently interchangeable), sooner or later someone will refactor it or change the config file or something and your microsecond-scale latency that you were assuming and relying on has suddenly become multiple-millisecond latency and everything grinds to a halt.


And if your coding style doesn't distinguish between an actual network link and an imaginary in-process link...

If you start looking into the ways that parallelism can become highly inefficient on the current multicore architectures, you'll find that there is an inter-core/socket memory hierarchy/communications barrier which isn't well established in the mainstream consciousness of programmers. It turns out, that there is often far less of a difference between an actual network link and an imaginary in-process link than a naive programmer might believe there is, and the conditions which could cause this can result from a subtle interplay between multiple hardware and software mechanisms.

Here's one - http://en.wikipedia.org/wiki/False_sharing

(or worse, deliberately makes them indistinguishable and silently interchangeable)

Erlang/OTP actually makes this tradeoff, much to its advantage. In fact, the multicore pathologies I refer to above make the tradeoff more attractive.

sooner or later someone will refactor it or change the config file or something and your microsecond-scale latency that you were assuming and relying on has suddenly become multiple-millisecond latency and everything grinds to a halt.

What you describe is either a poorly managed shop or a poorly conceived programming environment. Either, the behavior of such a system should be one of the 7 or so things you must know to program such a system, or the environment should make it glaringly awkward to rely on something as a synchronous call.


You're reciting an obfuscated tautology -- hard things are hard.

But most things are not hard. Hard things can be dealt with if and when they come up through documentation and training. Dealing with them by making easy things equally hard is silly.


You're reciting an obfuscated tautology -- hard things are hard.

Try on this analogy: Moving in deep blizzard conditions is hard. Using vehicles with tracks and skids would make that easier, but that's obviously not a sensible and useful vehicle because it would suck for driving on dry highways.

But most things are not hard. Hard things can be dealt with if and when they come up through documentation and training. Dealing with them by making easy things equally hard is silly.

You could just as easily apply this reasoning to goto statements. Also to memory management. I'm not saying you don't have a point here -- I'm on board with "the right tool for the job" -- but your analysis could be a bit more nuanced.


I don't think you understand what my reasoning is, because it would generally counsel against goto and manual memory management, which are harder tools for solving harder problems, and are usually unnecessary.


I don't think you understand what my reasoning is

That you shouldn't restrict the use of certain tools/constructs/features in order to make areas like concurrency easier. Apparently you misunderstood my analogy. For one thing, it is an analogy, and not a statement you would use goto and manual memory management.

would generally counsel against goto and manual memory management, which are harder tools for solving harder problems,

That is a matter of scale. At small scales, "just using a goto" seems easier. It's only at larger scales that it becomes untenable spaghetti, so is harder. Herein is another analogy which can be related to concurrency and parallelism.


How am I saying that?

In-thread control transfers vs in-system IPC vs network links all have very different performance (bandwidth/latency) profiles.

The language should not encourage people to conflate them. They behave differently enough that making them indistinguishable is not a sane abstraction.

This does not mean that they need to have wildly different interfaces. Just like 'int' and 'double' don't have wildly different interfaces, but your code still needs to specify which it's using.


If you're treating abstractions as black boxes, I can see how you would have a problem. Otherwise, I don't see one.

When performance matters, specify/document the performance characteristics of your modules and their deployment proximity requirements. This is hardly unusual, I do it all the time.


Essentially, instead of fighting against concurrency, by building big fancy black boxes of state, we now embrace concurrency, and let it permeate through the (green) fabric of the program.


This isn't actually true of Go channels, though. They don't have a way of reporting I/O errors (because no I/O), and they're often synchronous.


You're totally right. That's why we call it "like" go channels, this is one of the differences. (another one is that you can't map arbitrary go types: there is only 1 return channel and fd allowed per message, at least for now).




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: