Wondering about something: if you need to have a long task (5s to 10s) in the ba...

denibertovic · on June 18, 2014

Hmm, seems we're talking about 2 things here.

If your ajax request requires long task processing and requires you to wait for it than this is not a background task any more, it's done in one of the web server threads, and even if the thread outsources the task to another process it's still waiting on that proces to finish before returning the ajax response. This is bad.

I'm not entirely convinced about websocket solutions in Python yet, but I've been told flask-websockets is awesome. Nevertheless this doesn't solve the problem for you. Cause the request is just keeping an open line and waiting for a respone....blocking is bad.

The most simplest advise I would have is to have the ajax request trigger a background task and return immediately. The background task will then have some kind of side effect (ie. write some result to a database somewhere) which the ajax request can the look for with some kind of polling mechanism (on some other endpoint). Of course you can complicate this a lot, depending on your needs, but this seemed like the most straightforward solution.

zo1 · on June 18, 2014

"I'm not entirely convinced about websocket solutions in Python yet, but I've been told flask-websockets is awesome. Nevertheless this doesn't solve the problem for you. Cause the request is just keeping an open line and waiting for a response....blocking is bad." Tornado only blocks if you do something silly. It's event based, and can keep hundreds of connections open and waiting for it's async response event before actioning/responding the open connection.

"The most simplest advise I would have is to have the ajax request trigger a background task and return immediately. The background task will then have some kind of side effect (ie. write some result to a database somewhere) which the ajax request can the look for with some kind of polling mechanism (on some other endpoint)." Wow, overkill much? Polling is bad, and is exactly the kind of bad solution that a lot of these libraries are in place to prevent developers from needing to do.

Websockets were made to solve the long-polling and poll-spamming that was prevalent. Now all you have to do is keep a light, open web-socket connection to the server. And the server, being async/evented, will respond when the task is good and ready. Nice and clean.

denibertovic · on June 18, 2014

"Tornado only blocks if you do something silly. It's event based, and can keep hundreds of connections open and waiting for it's async response event before actioning/responding the open connection." - Yes pure tornado based apps are probably fine if you know what you are doing.

"Wow, overkill much? Polling is bad, and is exactly the kind of bad solution that a lot of these libraries are in place to prevent developers from needing to do." - Polling is not bad if you have a good use case. You just cannot do non-blocking stuff with Django for instance, or it's very very hard and tricky. Websockets also limit you with the number of connections you can have open at once.

oulipo · on June 18, 2014

So you think polling is the most effective solution, it is perhaps the case.

I was thinking whether using something like gevent or Tornado, a bit like nodejs, would enable the webserver to keep the socket open without blocking while the computation is made in a worker, then return the result simply to the socket, thus avoiding having to write a more complex websocket-based or polling-based system, but rather using AJAX transparently :)

denibertovic · on June 18, 2014

Doing non-blocking is tricky, and I'm not convinced that Python's solution are where I'd like them to be on this topic. Also keep in mind that a number of open TCP connections is also a finite number, so you can't really scale well with websockets that way, IMO. But again, it depends on your use case.

scott_w · on June 18, 2014

For what it's worth, I'm working on exactly this problem with Django+Celery.

Polling seems to be the best way to do it, as it doesn't leave sockets open, and doesn't require a websocket enabled browser.

The implementation I'm working on involves keeping the task metadata in the DB, and polling against that lookup (it makes it easier to do things like restrict task results to specific users as well).

I was also thinking that another way to do it could be to write the result in its final format to a /ajax_output/ directory with a randomly generated name. Then your polling would depend entirely on nginx, which could end up being much more efficient than running through your application framework. Just make sure you regularly clean unused files if you have privacy concerns.

jaegerpicker · on June 18, 2014

I really like tornado and websockets but keep in mind it gets dicey to scale on one box after you get to about 50 open connections at the same time on one box. You can do things to stretch that out but it's not the easiest thing. You also still have browser requirement issues. So it really depends on your use case polling, which is my least favorite method, is the most versatile method. It's easy to use flask for all of these issues. That said I'm a big fan of Tornado.

jkarneges · on June 18, 2014

Pushpin could be good for keeping sockets open for the duration of a task without tying up threads in the webapp.

http://blog.fanout.io/2013/04/09/an-http-reverse-proxy-for-r...

mctx · on June 18, 2014

What's the use case? Do you need to know exactly when the task is done? Does it vary in duration significantly? Can you split the call into two - one to start it, another to check the status given an ID?

oulipo · on June 18, 2014

It could be the user sending a computation to the server and wanting its interface to be updated as soon as the computation is done, it is feasible by regularly polling the backend after launching a worker process, but this adds complexity compared to simply opening a non-blocking socket a la nodejs & waiting for the worker to finish its job & sending the result back to the browser