Wondering about something: if you need to have a long task (5s to 10s) in the background, or even longer, for an AJAX request, what should you rather do:
- use gevent + gunicorn, or Tornado, in order to keep a socket open while the worker is processing the task?
- use polling? (less efficient)
- use websockets (but then the implementation is perhaps a bit more complex)
If your ajax request requires long task processing and requires you to wait for it than this is not a background task any more, it's done in one of the web server threads, and even if the thread outsources the task to another process it's still waiting on that proces to finish before returning the ajax response. This is bad.
I'm not entirely convinced about websocket solutions in Python yet, but I've been told flask-websockets is awesome. Nevertheless this doesn't solve the problem for you. Cause the request is just keeping an open line and waiting for a respone....blocking is bad.
The most simplest advise I would have is to have the ajax request trigger a background task and return immediately. The background task will then have some kind of side effect (ie. write some result to a database somewhere) which the ajax request can the look for with some kind of polling mechanism (on some other endpoint). Of course you can complicate this a lot, depending on your needs, but this seemed like the most straightforward solution.
"I'm not entirely convinced about websocket solutions in Python yet, but I've been told flask-websockets is awesome. Nevertheless this doesn't solve the problem for you. Cause the request is just keeping an open line and waiting for a response....blocking is bad."
Tornado only blocks if you do something silly. It's event based, and can keep hundreds of connections open and waiting for it's async response event before actioning/responding the open connection.
"The most simplest advise I would have is to have the ajax request trigger a background task and return immediately. The background task will then have some kind of side effect (ie. write some result to a database somewhere) which the ajax request can the look for with some kind of polling mechanism (on some other endpoint)."
Wow, overkill much? Polling is bad, and is exactly the kind of bad solution that a lot of these libraries are in place to prevent developers from needing to do.
Websockets were made to solve the long-polling and poll-spamming that was prevalent. Now all you have to do is keep a light, open web-socket connection to the server. And the server, being async/evented, will respond when the task is good and ready. Nice and clean.
"Tornado only blocks if you do something silly. It's event based, and can keep hundreds of connections open and waiting for it's async response event before actioning/responding the open connection." - Yes pure tornado based apps are probably fine if you know what you are doing.
"Wow, overkill much? Polling is bad, and is exactly the kind of bad solution that a lot of these libraries are in place to prevent developers from needing to do." - Polling is not bad if you have a good use case. You just cannot do non-blocking stuff with Django for instance, or it's very very hard and tricky. Websockets also limit you with the number of connections you can have open at once.
So you think polling is the most effective solution, it is perhaps the case.
I was thinking whether using something like gevent or Tornado, a bit like nodejs, would enable the webserver to keep the socket open without blocking while the computation is made in a worker, then return the result simply to the socket, thus avoiding having to write a more complex websocket-based or polling-based system, but rather using AJAX transparently :)
Doing non-blocking is tricky, and I'm not convinced that Python's solution are where I'd like them to be on this topic. Also keep in mind that a number of open TCP connections is also a finite number, so you can't really scale well with websockets that way, IMO. But again, it depends on your use case.
For what it's worth, I'm working on exactly this problem with Django+Celery.
Polling seems to be the best way to do it, as it doesn't leave sockets open, and doesn't require a websocket enabled browser.
The implementation I'm working on involves keeping the task metadata in the DB, and polling against that lookup (it makes it easier to do things like restrict task results to specific users as well).
I was also thinking that another way to do it could be to write the result in its final format to a /ajax_output/ directory with a randomly generated name. Then your polling would depend entirely on nginx, which could end up being much more efficient than running through your application framework. Just make sure you regularly clean unused files if you have privacy concerns.
I really like tornado and websockets but keep in mind it gets dicey to scale on one box after you get to about 50 open connections at the same time on one box. You can do things to stretch that out but it's not the easiest thing. You also still have browser requirement issues. So it really depends on your use case polling, which is my least favorite method, is the most versatile method. It's easy to use flask for all of these issues. That said I'm a big fan of Tornado.
What's the use case? Do you need to know exactly when the task is done? Does it vary in duration significantly? Can you split the call into two - one to start it, another to check the status given an ID?
It could be the user sending a computation to the server and wanting its interface to be updated as soon as the computation is done, it is feasible by regularly polling the backend after launching a worker process, but this adds complexity compared to simply opening a non-blocking socket a la nodejs & waiting for the worker to finish its job & sending the result back to the browser
- use gevent + gunicorn, or Tornado, in order to keep a socket open while the worker is processing the task?
- use polling? (less efficient)
- use websockets (but then the implementation is perhaps a bit more complex)
can you do this simply using Flask?