Hacker News new | past | comments | ask | show | jobs | submit login
Playframework: Async, Reactive, Threads, Futures, ExecutionContexts (sadache.tumblr.com)
73 points by sadache on Feb 5, 2013 | hide | past | favorite | 35 comments



The more and more issues highlighted with the playframework the better. I find it a great piece of kit to play around with.

I've not come across much commentary with regards to 'bad' ideas inside the framework which I should consider when using it.


So what am I paying for? I wasn't born yesterday; I know what the terms mean. Get to the point because I don't have all day to look at some guy holding his hat on.


My understanding was that pages which are inherently non blocking could be blocked due to the use of shared threadpools and by default leveraging the same execution contexts.

I would love to understand if this is a problem in practice, or if there are other practical implications.


It's definitely a real, common problem, but I think the article obscures it a bit. Blocking I/O operations using the Akka dispatcher will block that thread while the command executes. Doing this too often will cause all the akka threads to be blocked, stopping the dispatcher from processing more events. Therefore, one should avoid performing blocking operations on akka threads.

In my experience, one of two things will happen in practice:

1) You use an asynchronous library for I/O calls, which will maintain its own threadpool. This is no problem, because the I/O threadpool is independent from the akka theadpool.

2) You want to use a synchronous library that performs I/O operations using the caller's thread. If you must do this, you should use a separate threadpool (ExecutionContext).


Why do people still continue to push the myth that using event driven style with callbacks and a huge unmaintainable mess is the only alternative to native threads? Userland threading has been around for a very long time. The only difference between using an event driven style and using a userland thread lib is that with userland threading the complexity of managing the state of the various computations is done by the thread lib instead of by you.

The sooner people stop promoting the false dichotomy, the sooner people will realize every language should do concurrency as well as haskell, and the sooner that will actually happen.


In practical terms what happens is that your clients need to have a real good mental model of what will block and what won't.

Also it turns out event driven programming is fine when you do it, but wait 'till you have curl handles and say mysql handles and blah 'andles and foo andles. Fork andles!

Threading sucks. Mixing event loops also really sucks.

The developer centric way of doing it (e.g. screw performance I just want it to be correct and look nice) is really to flip to message passing and separate "logical processes". That's our unfortunate state of the art.

EDIT: I didn't mention that I think all the userland threading libs I know of are crap. Which ones do you think are good? P.S I'm not going to be sarcastic or critical about it, I would really like to have portable, usable LWP in C or C++.


Both suck, but I'll take threading (acutally I'll take Erlang's processes the most) but if I had to choose between the two, I'll take threads.

> but wait 'till you have curl handles and say mysql handles and blah 'andles and foo andles. Fork andles!

Yup. If I have an event driven program can I use another library that is not built on the same event driven framework?

The answer is usually no. Even if it has the nice internally composible "futures/deferreds" feature.

An operation that is non-blocking will return some kind of a future/promise/deferred thingy. Those bubble all the way to the top through the API. It means you need to have 3rd party libraries and support code that knows about them.

This is the case with Twisted (which is a nice framework) but it means having to have a parallel universe of libraries for everything (because they have to return deferreds or being able to be passed in deferreds).


Thanks. I vigorously agree with you. I really wish I had a better way to explain it to people who haven't been screwed by it yet.


>Yup. If I have an event driven program can I use another library that is not built on the same event driven framework?

This is why my original post suggested that people need to start demanding better of their languages. This is a total non-issue in haskell because the "event driven framework" (that's really over-stating it, they are just relatively simple libs) is part of the language runtime. So everything uses it. I get concurrency and parallelism with the scalability of an event loop just by doing:

    forkIO myFunction


>In practical terms what happens is that your clients need to have a real good mental model of what will block and what won't.

I don't understand. Who are "clients" in this context? You need to know what calls will block and what calls won't regardless of style (event vs thread) unless you are using OS threads or processes.

>Threading sucks.

Why?

>Which ones do you think are good?

The one that comes with ghc.


I think perhaps the communications problem here is a difference in context.

I'm coming from a C/C++ background. I get some Haskell but you'll have to explain specifics to me (I'm certainly willing to change my mind!).

Are you using a framework, or developing one? Who gets to choose the concurrency strategy?

As a framework provider, if you provide LWPs to ppl (like provide open and read and all that stuff for them) so they can write sequential code as if it were sequential, but it's actually nonblock, what happens in C/C++ is that when they need to integrate third-party stuff, life gets really complicated. Clients are the clients of the API I'm talking about, which is a hypothetical "lightweight thread" based API where everything just works as per original post.

>>Threading sucks. >Why?

Well, in an ideal model of computation we wouldn't need to synchronise things. This is the great attraction of likesay a node-based model where we run our stuff as synchronous events. Once you accept threading you need to care about who does what to which at what time. Also you need to figure out e.g. how your dependencies are thread-safe and under what conditions. You might have berkeley db where contexts are not generally thread-safe but handles can be if you set the right flags, for example.

>>Which ones do you think are good? >The one that comes with ghc.

That's cool :) I should probably learn more about it. The really cynical part of my mind is going "please tell me about this magical solution to integrating different event loops and concurrency models" and the less cynical part is open to your suggestions.


>Are you using a framework, or developing one?

No.

>Who gets to choose the concurrency strategy?

The person writing code?

>Clients are the clients of the API I'm talking about, which is a hypothetical "lightweight thread" based API where everything just works as per original post.

In that context, I am the client. Here's how it works:

    forkIO myFunction
That's it. Very simple, scales to millions of "threads", and if compiled with -threaded, those millions of "threads" are spread over a set of OS threads so I also get parallelism.

>The really cynical part of my mind is going "please tell me about this magical solution to integrating different event loops and concurrency models" and the less cynical part is open to your suggestions.

The key is just in the "demanding better from languages" part. You don't have to worry about a dozen incompatible green thread libraries if it is built into the language.


> Why do people still continue to push the myth that using event driven style with callbacks and a huge unmaintainable mess is the only alternative to native threads?

Performance.

Before it used to be the famous C10K problem. And the best solution was a solution based on a select/epoll/kpoll loop. That was then (5-10 years ago). HAProxy is still build this way. Nginx is build this way as well. That works ok for smallish applications that are only IO bounded.

Now there is also node.js. People want to use Javascript on the server with node.js (which is fine). But node.js is a single threaded application and asynchronous programming is the main way to do it. So all the advantages we are hearing about are well, people like to admire the technology they use because it makes them feel better about themselves (so it is not always causally the other way -- the pick technologies they like, sometimes they are forced to pick and then they force themselves to like it).

> Userland threading has been around for a very long time. The only difference between using an event driven style and using a userland thread lib is that with userland threading the complexity of managing the state of the various computations is done by the thread lib instead of by you.

Yes. And it turns out that in most practical application userland threading is actually built on an asynchronous IO select-like call. Think of Python's greenlet based concurrency mechanisms (gevent and eventlet). There there is the advantage of using threads and the advantage of using small per/thread storage and switching costs. So I think that this the most sane way to handle a highly concurrent, complex, application.

Ok, I lied. The most sane way is to also provide data isolation & CPU based concurrency. For that you need Erlang (or maybe Elixir which runs on Erlang BEAM VM).


Well callback style is closest to the metal, greenlets look nicer (at the expense of this opaque dispatch loop) and IMHO both them cause pain over time :/

It's like a deal with the devil for performance... I guess I am disagreeing with the grandparent post. Maybe in Haskell it works (or is free), but C/C++, it works, but you can see you're paying for it.


> It's like a deal with the devil for performance

Well put. You usually pay for it. Not familiar with Haskell too much to tell. It might provide the solution to trick the devil. The devil usually wins the bets though ;-)


Paying for it how? You get a simple model to work with and reason about, and the performance of an async event loop. Where does the problem arise?


OK. Can you block on a sysv semaphore and a socket at the same time? How do you integrate event loops using these two primitives at the same time?

(EDIT: hint your runtime can't, because the kernel doesn't provide a way to do that. I guess I should be more explicit).


> I am not blocking or using event loops, that's the point. When I call read on a socket, the IO manager handles saving my context, doing the async read, yielding, and then restoring context and resuming when there's data to be read.

Doesn't this make it harder to reason about cases where multiple userland threads need to mutate the same state? You can't tell by inspection whether the function you're calling is going to yield somewhere deep inside.

With single threaded callback style you know your thread is the only thing that is executing and won't be interrupted. (Sure, other code can run between now and when your callback is invoked, but in practice I find that to be pretty easy to deal with.)


>Doesn't this make it harder to reason about cases where multiple userland threads need to mutate the same state?

Most languages make it very hard to reason about state in general. Haskell provides STM, so ensuring correct access to shared state is as simple as using an MVar. I absolutely agree that this is an important part of languages making concurrency a priority, they can't just add green threads and pretend that is good enough.


Correct me if I'm wrong but STM doesn't help you if your operations include side effects outside of STM (writing to disk, a database, the network)--exactly what async operations are usually used for. Aren't you back to locking at that point?


I don't understand the problem you're describing. If you're waiting on some external resource then you've yielded to another "thread" in either an event loop or green thread style. Are you suggesting partially modifying some state, then waiting on an external resource before finishing modifying the state? And no other "threads" can touch the state in the mean time? That is going to be a bad thing to do no matter what style you use.


Yes. My point is in the green (or native) thread style, it is not always obvious when you've stumbled into this situation (since you can't tell by looking at a function call whether it's going to yield at some point).


I am not blocking or using event loops, that's the point. When I call read on a socket, the IO manager handles saving my context, doing the async read, yielding, and then restoring context and resuming when there's data to be read. The entire point is that green threads are equivalent to event loops, but taking advantage of abstraction to move the complexity away from the thousands of individual programmers, and onto the language authors.


(I was going to opt out of this, but I can spend another 1/2 hour on it_).

What you are saying is, you're a client of someone else's nonblocking event loop. So all the classical stuff regarding nonblock io loops apply.

E.g: http://talkingcode.co.uk/2008/12/02/haskell-gtk-and-multi-th...

But substitute Gtk for Qt, or anything where you're the sucker dealing with an external concurrency model or event loop.

Hilariously, they end up having the same debate we are here, just in Haskell.

Seriously, there's no pixie dust for this.


That link is a good example of exactly why we need to push for languages to take concurrency seriously. That's an example of how a haskell user has to deal with the issue when interfacing with a C lib that rolled their own event loop. For most languages people have to deal with that issue all the time, for their own code, entirely in that one language.

I have no idea why you are talking about pixie dust. My point was that explicitly using event loops instead of abstracting it into a green thread library is dumb, and we need to start having higher expectations from languages rather than accepting backwards shit like node. Not "green threads are magic".


>Performance.

Is identical.

>And the best solution was a solution based on a select/epoll/kpoll loop

But the underlying mechanism does not need to dictate what interface is exposed to be used.

>And it turns out that in most practical application userland threading is actually built on an asynchronous IO select-like call.

That is precisely the point. Presenting it as though the only options are "use native threads" and "use callbacks and event loops" is the myth. Use userland threads, you get the same performance and scalability as an event driven style, as it uses the exact same async calls under the hood, but you get a programming style that is readable and maintainable.

Using an event loop is like rolling your own while loop out of setjmp/longjmp. Yes, the underlying mechanism of a loop is the same, but abstraction is pretty nice.

>For that you need Erlang

Why?


> For that you need Erlang

Because the other side of highly concurrent applications is increase complexity. Both threads and async-based _large_ applications become a mess, fast. No talking about HAProxy type small server, but complicated business rules, state machines and so on. Shared state is hard to reason in non-concurrent application, large concurrent ones, especially with shared data structures, become complicated.

That is why light-weight isolated process based concurrency is the best IMHO. Erlang is at the for-front of that. It also has a completely concurrent garbage collector (it won't stop the world) since each process also has a private heap. But as the infomercial guy says, "But wait, that's not all". Add hot code reloading and being able to schedule all these light-weight processes on any number of CPU in an m:n fashion is what takes the cake.

Aha, so is it all rainbows and unicorns. No. You pay for it by getting a sequential slow down. So if you benchmark mandelbrot or matrix multiplication on C++ vs Erlang, C++ will win. Safety and isolation doesn't come for free. So pick and choose based on what requirements you have. There is the syntax issue that people don't like so those are some trade-offs.


None of that explains why you need erlang. Any language can do message passing. The whole point is that we should be expecting these capabilities from every language, rather than pretending using an event loop is in any way reasonable.


You don't. But that is the complete package where isolation and fault tollerance was built in from the start (in the VM, the tool chain, in the libraries) and so on.

Go, Rust, DartVM, Haskell and Scala's Akka all do it. You can do it in C++ and C even. The problem is unless there is true isolation there is always a chance of global data being accessed and updated. There are lock, mutexes, barriers, etc. That is why true isolation comes in.


> >Performance. Is identical.

In a very high concurrency environment with OS based threads it is not always the case. OS threads start to show their overhead ( both in switching and memory consumption ) when we get to 10K+ # of requests (I am assuming some server-client architecture here).

Async or green threads have a smaller overhead of switching, but any CPU based concurrency is usually missing (save for Erlang,Go,Haskell & Rust?). So for small IO bound programs it becomes hard to beat an epoll based loop (as nginx and haproxy show). As soon as any dispatches start computing things then the whole OS process blocks and sockets start throwing errors and everything goes to shit.


>In a very high concurrency environment with OS based threads it is not always the case.

We're talking about green threads, remember?

>Async or green threads have a smaller overhead of switching, but any CPU based concurrency is usually missing

The point is that async and green threads are the same thing, just one has a better interface. Of course they both have the same downside.


> We're talking about green threads, remember?

Sorry lost context. Yes, for green threads you are right. I had regular (OS) threads in mind.


Don't you trade for a different type of complexity in yield/resume?

Also you will lost some of the scalability advantages because you have to store context for all of the idle fibers.


In Haskell at least you do not have to explicitly yield. Instead, any call to the garbage collector is treated as a possible yield point (so it's possible to hog a CPU if you write a non-allocating loop).

Not sure what to say about your second point: you do have to store context for the other fibers, but only in the same way you have to pay to store the context of an OS thread or pending callback.


>Don't you trade for a different type of complexity in yield/resume?

That depends on the library in question. Generally the event/callback style code people are writing are only "yielding" at calls to the non-blocking version of otherwise blocking functions, like read type functions, or accept. Most userland threading libs I've seen make the assumption that you want that, and so you don't need to yield yourself. If you did want that level of control, it is still simpler than the event style of achieving the same result.

>Also you will lost some of the scalability advantages because you have to store context for all of the idle fibers.

The overhead is small, especially if you know you don't need much stack and set the thread stack size lower.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: