
Queue everything and delight everyone - rajbala
http://decafbad.com/blog/2008/07/04/queue-everything-and-delight-everyone/
======
dchuk
I've built quite a few apps that use queueing to do the heavy processing
behind the scenes, and I've always used a polling system to check progress.
But it always has felt like a half-baked solution.

Are there any patterns or documentation projects or books or tutorials or
anything else out there that cover how to have users submit something into a
queue system, provide immediate feedback, then provide accurate progress
updating as the job is processed, then show the results, all without having
the user refresh their process or navigate away and back again?

Let's say for instance, I built a service that lets a user upload 20 images at
a time to be cropped and resized. Is there a commonly accepted pattern for
alerting them of the resizing job progress as it is happening?

~~~
habitue
For users in a browser, use websockets for modern browsers, and long polling
for older browsers (Socket.IO wraps up these solutions nicely)

For sending async results to other servers, use webhooks. They give you an
endpoint to call when the job is done processing and you call it when you're
done.

~~~
dchuk
But that still doesn't really solve the whole progress reporting part of it.
As per my example, the "job" would really be 20 individual cropping/resizing
jobs. In order to know that the whole set of jobs is complete, there needs to
be a checking job or something similar that looks to see out of the total
number of jobs started, if all are now completed, and if so, tell the user.

Is there a pattern for that type of situation? Some sort of watcher pattern or
master job monitoring pattern that allows accurate progress reporting?

~~~
habitue
Send progress events over your push channel. Push an event every 1% or 10% or
whatever you need depending on what you need for your experience.

If you're talking about generic progress reporting from a tool that doesn't
know what the underlying job consists of (i.e. it only knows "started" "done"
"errored"), then I'm not aware of any pattern to provide an accurate progress
meter. You could perhaps keep statistics on the length of past runs (like some
CI servers do for build times), but it would really depend on your application
whether users are willing to accept that those progress bars are just a guess
and sometimes run over.

------
orthecreedence
The implementation/operational complexity of a queue (assuming you don't just
hammer it into your existing database, which can sometimes be a bad idea) many
times outweighs the user having to wait an extra 200ms.

It's nice when you _have_ to have a queue for image/audio/video processing,
page scraping, etc because then you can use the queue for other stuff, but for
the most part, excluding large jobs, you can get away without having a queue
for quite a while.

So if you have a queuing system, great, use it. If you don't, try to avoid
building it until you really need it.

~~~
MaxGabriel
What's your experience with Databases as queues? I was interested in using
Queue Classic (Postgres based) because SQL is so nice, plus databases are
already quite good at not losing data.

~~~
orthecreedence
My only real experience is with MongoDB, which we chose for queuing because we
were already thinking of using it for our main document store. It flopped,
entirely. Queue systems are generally very high write...you're constantly
grabbing values while at the same time modifying them.

I think if we had used any *SQL database we would have been fine. Although
they don't necessarily have high write compared to a lot of in-
memory/distributed systems, they certainly do better than a global write lock.

I haven't ever used Postgre so I don't know of its queuing capabilities. If it
has things like atomic modify-and-read, then you'll be fine. After the MongoDB
debacle, I went to another company and we used Beanstalkd for a dedicated
queue...couldn't have been happier. The protocol is dead simple, and it's so
beautifully geared towards being a queuing system that I'll probably never
want to go back to any other DB for queuing =]. Worth a shot if you need a
queue and have the resources to run it, otherwise I bet Postgres will work
just fine.

------
ambiate
I'm currently working with ActiveMQ. I try to think of it as multiple IRC
networks. Channels as topics for users by users, servers as gateways to
isolated materials/users, server global messages as topics for users by
servers, server local messages as queues for users by servers and private
messages as queues for users by users.

It is absolutely fascinating offloading a message to the queue and letting the
back-end handle workflow and shuffling it to another queue for micro-tasking
(such as in the article). In fact, the topic/queue paradigm has really shifted
my whole view on services.

Forgot to mention the queue added a layer of security. Our front-end is now
the same as spawning a shell in the server (as far as data access). If you can
spawn a shell on the webserver, the server is isolated from everything except
the MQ ip/port combo. This means that what was possible on the front end is
the only thing possible on the back end (as far as penetrating data). I'm sure
a clever person could get figure out a way, but for your average person, that
will be a show stopper.

~~~
encoderer
We use ActiveMQ and it's fine. It speaks STOMP which is nice for interop. It's
JVM so it gets a little weird when it's out of memory.

We're also playing with Kafka which takes things in a different direction than
the JMS-based systems like ActiveMQ. So far so good.

------
fragsworth
I understand the purpose of doing this, but in most cases I don't see any
extremely simple way to implement it. Lots of decisions tend to revolve around
whether the development/maintenance time spent doing something justifies the
efficiency provided by it.

Especially if you're building a minimum viable product, if something on a
server responds 10ms later than a queue implementation, but takes 1 hour to
implement instead of 5, I am inclined to go with the 10ms delay.

------
MaxGabriel
Something at work we use that's great for this is ResqueDelayable. Typically I
see Resque tutorials extract bits of logic out into worker classes, but this
adds refactoring/organization overhead. ResqueDelayable allows this non-queued
work:

    
    
        def post_comment
            add_comment
            self.notify_followers # Could have lots of followers
            Analytics.record_new_comment # Unnecessary for api response
        end
    

to become queued work, with no real refactoring:

    
    
        def post_comment
            add_comment
            self.rdelay.notify_followers 
            Analytics.rdelay.record_new_comment 
        end

------
lstamour
FYI: Published in 2008. (Still true today, but I think we've learned these
lessons now.)

------
jbuzbee
Makes me think of Meteor where changes to a document are reflected in the
client immediately and then possibly rolled back if the the server-side
equivalent change fails which typically would be very rare.

