Hacker News new | comments | ask | show | jobs | submit login
Rails 4 will establish a new background job queueing API (github.com)
172 points by jroes on Apr 27, 2012 | hide | past | web | favorite | 57 comments



Thanks for accurate title - this is not a full queueing system, but a unified API for hooking in bigger, badder queueing engines like Resque.

The point is to standardize the interface so other plugins/gems can simple make calls to Rails.queue rather than try to accomodate every queueing engine themselves.


Someone please correct me if I am wrong.

Skimming through the code, this lets you register a Queue class to serialize your jobs. So, if you use something like Delayed Job, you register the (corresponding) DJ::Queue class that stores the jobs in whatever backend you desire and then process it later via your daemon of choice.

So far so peachy keen. This is alright, I can get behind this - it will make moving between queueing solutions more palatable which is not a feature I can complain about.

My question then is: how will this work by default? Will the default Queue have some sort of callback that executes after it returns the response? For stuff like sending emails, for small apps, this is actually palatable - I'm concerned about user latency than sheer requests/second.


The default implementation is a stdlib Queue (http://www.ruby-doc.org/stdlib/libdoc/thread/rdoc/Queue.html) which will be consumed by a Rails::Queueing::ThreadedConsumer (https://github.com/rails/rails/blob/602000b/railties/lib/rai...). You just drop job objects into the queue, and the consumer thread will call #run on them.


It sounds like the default queue implementation is an in memory queue that runs in the same process on another thread.


[deleted]


J2EE was an entire specification for application frameworks. This is just defining the bridging interface between Rails and any (existing or not) queueing framework. It's already been achieved nicely with Rails.cache.

In any event, the web shows it is actually possible to separate interfaces from implementations. J2EE had other issues.


It would be awesome to have this in Django too.

Celery already does a great job, but it would be nice to have the batteries included.


I agree - I see this as similar to Django's pluggable caching backends.


Let's not, Celery is doing an absolutely fantastic job in this space, let's just stay out of their way and do what we can in terms of exposing APIs to make their job easier. There's a reason django-core didn't write celery in the first place, we didn't have the need or the expertise; there are other people with both, let's let them do it.


> do what we can in terms of exposing APIs to make their job easier.

Isn't that exactly what Simon is suggesting above?

He's not saying Django should provide its own implementation of a background queue, he's saying it should provide a base API which could be implemented by any number of backends, of which assuredly Celery would be one -- just as happens now with cache backends.

I agree with Simon and the commenter above, this would be a great addition to Django. I think it fits with the Django "batteries included" philosophy -- in this day and age, a background queue is practically a requirement for anything but the most basic web app. It also encourages standalone Django application developers to make use of background queuing without fear of forcing a specific implementation on users.


Yes, that's exactly what I meant. Like you say: today, a background queue should be part of the default stack for a web (just like a template engine, database, session storage and a cache have been in the past - components which Django has provided since day one). No need to re-implement celery, but encouraging the Django ecosystem to embrace offline queues (and letting reusable apps know that they can push tasks in to an abstract queue of some sort) would be very healthy.


I agree that this would be a great addition to Django. Celery may not fit everyone's needs. I would rather have something lighter weight for dev.


I wasn't suggesting have a generic queue API in Django, I was suggesting any APIs needed to enable that to live outside Django entirely, whether that's better hooks into transactions (e.g. so I can wait until a DB transaction is committed to fire the task item) or something else.


I'd love to see these job queuing platforms have better support for high performance computing (HPC). Currently there are two paradigms of queuing systems. Things like PBS/Torque and Sun/Oracle/Univa Grid engine which work very well for small numbers of largish batch jobs, and things Delayed Job, Background Job and Resque which work well for huge numbers of small jobs.

When you start dealing with large jobs, system resources start to become an issue. A job might take 48GB of memory, or it might take 1GB of memory, and the scheduler needs to be aware of this so that it isn't scheduling jobs on top of each other. Or you might have some low priority jobs that should only be run when the queue is mostly full so as not to compete with the high priority jobs. Or you might have jobs that depend on other jobs, and you want to enqueue them all and let the scheduler handle the dependencies. HPC schedulers deal with these requirements well.

On the other hand, you might be in a situation where you have 10s of thousands of jobs in the queue, and you need to add and remove jobs quickly. Things like resque and delayed job handle these situations well.

HPC schedulers were built for research purposes, and background job schedulers were built for the web applications. However there are more and more companies dealing with large data problems that span both worlds. They have some large jobs and tons of small jobs, and they don't want to manage two separate clusters with two schedulers to handle the tasks.


I plucked the relevant points of discussion that reveal the thought process.

Q: "I've heard for years that pagination should remain outside rails since it has to be lightweight, and now that !?"

homakov: good example, but "pagination" is a design-related thing(like decal on a car) but "queue" or delayed jobs(jquery-deferred for example) is deep engine built in feature. As cars vendor You shouldn't choose decals for driver but you should install the best and reliable stuff under its hood IMO

...

Q: What's the point?

josevalim: The point of the Queue is to be small and provide an API that more robust engines like resque and sidekiq can hook in. So you can easily start with an in memory queue (as you can see, the implementation does not even reach 100LOC) which is also easy to test and then easily swap to another one. Why this is good? By having an unified API, tools like Devise, Action Mailer can simply use Rails.queue.push() instead of worrying with compatibility for different plugins. So the goal here is provide an API for queueing and with a simple in memory implementation. It is not meant to be a robust queue system.

...

Q: Why not make it into a gem?

josevalim: The implementation today is less than 100LOC, so there is no reason to move it to an external gem. If the implementation actually grows a lot, which I highly doubt, we can surely consider moving it to a gem.

...

Q: Why include it in Rails at all?

DHH: This is really very simple: Do most full-size Rails applications, think Basecamp or Github, need to use a queue? If the answer is yes, and of course it is, this belongs in Rails proper.

...

Q: Then, and I'm not just trolling, should Rails provide an API for user authentication or authorization?

DHH: authentication, pagination, etc are all application-level concerns -- not infrastructure. Think Person model vs ActiveRecord model. Another way to think of it is, would two applications have materially different opinions on queue.push depending on what they're doing? The answer is no. That is not the case for authentication, pagination, and other application-level concerns where the usage is often very different depending on what the application is trying to do.

...

Q: Is Rails getting too big?

DHH: The size of Rails itself is not a first-order metric of neither progress nor decline. The right question is: Does Rails solve more common problems than before without making the earlier solutions convoluted? In other words, what are the externalities of progress? Will introducing a queue API make it harder to render templates? Or route requests? No. It's most direct influence will be on things like ActionMailer, so a fair question will be: Is it harder or easier to use ActionMailer in a best-practice way after we get this? That's a fair question, but I'm absolutely confident that this will make using idiomatic AM usage (queuing mail delivery outside of the request cycle) much easier. Thus, progress.


I am curious, under what circumstances would one use this, rather than something like Rescue? And there is so much competition in this space, what exactly is the argument for having this as part of Rails?

Or, let me put my question a little differently. Github did an awesome job writing about their experiences, and the reasoning that lead them to create Resque. I'm wondering if anyone on the Rails team has posted an essay with as much background info as what Github did here:

https://github.com/blog/542-introducing-resque

But I'm also thinking about a conversation that happened here on Hacker News recently. 2 weeks ago: "Rails core killed ActiveResource"

http://news.ycombinator.com/item?id=3818223

and the original article touches upon the issue that I'd like to ask about here:

"It's not that I hate you or anything, but you didn't get much attention lately. There're so many alternatives out there, and I think people have made their choice to use them than you. I think it's time for you to have a big rest, peacefully in this Git repository."

Can't something similar be said about job queues? "There're so many alternatives out there, and I think people have made their choice to use them than you."?

So why create a new job queue system, and make it an official part of Rails? I am not sure I understand the intent.


> I am curious, under what circumstances would one use this, rather than something like Rescue? And there is so much competition in this space, what

The goal is not to replace the existing queue solutions, but to create a common API, so the rest of the gems can can just treat all of them in a uniform way.

Quoting Jose Valim:

"The point of the Queue is to be small and provide an API that more robust engines like resque and sidekiq can hook in. So you can easily start with an in memory queue (as you can see, the implementation does not even reach 100LOC) which is also easy to test and then easily swap to another one.

Why this is good? By having an unified API, tools like Devise, Action Mailer can simply use Rails.queue.push() instead of worrying with compatibility for different plugins.

So the goal here is provide an API for queueing and with a simple in memory implementation. It is not meant to be a robust queue system. "


It looks like this is meant to be an interface with multiple backend implementations, so Resque would become one of the potential backends.

I see this as a similar thing to having an interface for caching which can then be backed by memcached, redis or the filesystem. It strikes me as an excellent idea - pretty much every web application should have an offline queue of some sort these days.


I don't believe the intent here is to replace Resque (Resque is awesome), but provide a slim API at Rails.queue that Resque/Delayed Job/BackgroundDRB/Torquebox/etc. could tie into, similar to how Rails.cache works now, in addition to adding a simplistic default implementation.

Considering Rails has always been about best practices--and background job queueing is definitely a best practice--I think this is a great move.

This will also allow other gems/plugins to have an easy way to push their own jobs into the queue rather than trying to support a bunch of different queue implementations.


As I understand the discussion (underneath the commit log, josevalim gives a comment), it's not about re-inventing a job queue but to offer an API for queues where you can hook in what you want. That way other services can use a queue (sending mails, processing frobnicates) through an advertised interface without having to rely on a specific implementation. You still can run resque behind it. (Caveat: I only read the discussion, this is not informed by interpreting the code)


GitHub doesn't actually use Resque directly (well, except some rare cases). defunkt built RockQueue to be our internal queue interface while he migrate the app from DelayedJob to Resque. This looks like the same concept.


Rails 4 looks like it will have some nifty features - anyone have any information on when the first Release Candidate will be ?


I heard that it would be when it is ready :-P


A feature that may very well make me finally jump over to RoR. I've recently built quite a large site, and the only current bottle neck is when a few emails need to be sent off at the same time with attachments, and to be able to add that into a "que" and let the user continue browsing the site instead of stuck on a loading page (if only for a few seconds) would make the current set up ideal.

Incidentally - if any one has any way of doing this in PHP without having to setup cron jobs (and not using node or its derivatives), I'm really open to any ideas!


I've got great news then: you can make the jump to RoR today! :)

This news isn't about Rails implementing its own background queue, but rather creating a unified API for interacting with background queuing systems; of which there are many. Resque (crafted at GitHub [1]) is probably the most popular: https://github.com/defunkt/resque.

1: https://github.com/blog/542-introducing-resque


There's also a PHP port of Resque that's fully compatible with the Ruby resque web interface - https://github.com/chrisboulton/php-resque

I've used it in production for a few different projects and highly recommend it (both the PHP and Ruby versions).


Although certainly not without its issues, the most popular solution for that platform is Gearman http://gearman.org . It's fairly ops-intensive, but the most friendly for PHP without having to resort to things like Stomp to interface with messaging (MQ) systems. Which are not optimally designed for job enqueing, per se.


Why you say that messaging systems are not designed for queueing? You seem to imply that servers like RabbitMQ won't do the job.


Why wouldn't you want to use a DB queue or something, and have a separate cronjob / process the outgoing email?


With this commit, you can if you want. This code decouples Rails from external queue solutions. If your application needs to interact with a queue, you only have to write it once and you can use a standard API to do it. If your external queue solution (your DB queue code) conforms to the API, you can switch it out with another conforming solution when your needs call for it.

As someone pointed out in the OP comments, this is like Rack for queues.


A queue is FIFO oriented, a database is least-recently-used (LRU). It works, but is not going to be the most efficient tool.

Where a queue is really useful is converting from foreground to background, so that you can optimize for throughput, rather than having to leave free capacity for 'random arrivals' of your foreground servers. Think of it as the same as the same problem as the bursty traffic that a bank machine gets, and why you always seem to have to line up.

The mathematical term is Poisson distribution: http://en.wikipedia.org/wiki/Poisson_distribution


DB table "pending_emails" with a time field...

cronjob removes them based on time entered, and sends them.

shrug



Triggering a background process is pretty easy, here's some script ideas to do it:

http://de2.php.net/manual/en/function.exec.php#35731

http://stackoverflow.com/questions/1019867/is-there-a-way-to...

Alternatively, you can use curl to trigger a request inside of your page to another page (send_email.php) and don't wait for the response.

http://tech-hacks.net/tech/13/creating-php-cronjobs-without-...


I too rolled my own, and while trivial to create, it's always made me uneasy. If there's a bug, I won't know about it; Amazon SES will reject the emails if they're sent all at once, or perhaps the calls won't be made at all.

I ended up doing a little status page for my newsletter; I set it up to auto refresh in Opera, each one of of the refreshes sends 10 emails, and prints their statuses/destination/titles as they go (it's also rate limited in memcached). I chuck that the laptop or a third monitor and leave it for a couple of hours, keeping an eye on it as it goes.

Using something off the shelf I could trust would be much nicer.


Incidentally, Amazon SES has limits on how many mails you can send a second - even after your account is confirmed by them. You can see this limit on your control panel. Mine shows around 5 mails per second.

So you will have to add some kind of throttling to make it work.


Yep, which was a big reason why I did it the way it was. I was paranoid that I'd make a slip up in the throttling code and send too many emails (I guess they actually check it over a 10 or 60 second window), and the rest of the batch wouldn't go through properly.


It sounds to me like you need an email list server.


If you do decide to go the Ruby on Rails route, you might want to check out the ar_mailer gem as I believe it does _exactly_ what you're requesting.

https://github.com/adzap/ar_mailer


Just move your code to a register_shutdown_function() call and it will execute after the output has been sent, but without having to deal with forking a background PHP process or running out-of-context.


Rather than doing this using a queue in the webapp, why not let the on-host mailserver handle it for you?


I don't use Rails but I often look to it for good/simple design ideas. I'm interested in seeing how they implement simple, effective, reliable background queuing.


interesting - not sure if it's really needed though - I've used Redis and Resque before and found it's performance was blisteringly fast. (Resque was made by Github https://github.com/blog/542-introducing-resque)


It isn't about speed or choice of queue, it's about a standerized API for working with queues so you can focus on developing your application domain. You will still be able to use Resque or Sidekiq or DJ or anything else, there will just be a standard API for all of them to use.


Also, it's an ecosystem feature. If a library or a component of rails (e.g. ActionMailer) wants to process something in a background queue, the choice doesn't have to be between a host of bad options:

  * Forcing a dependency on a particular queue
  * Writing a wrapper for all possible queues
  * Falling back to queue-less behavior in the absence of a detected queue
They just use the Rails queue and it works on whatever real-world queue the user picks. Definitely good infrastructure IMO.


I think the point is to provide an abstraction layer, so that the community has some common feature set and protocol expectations when we're discussing different technical solutions to queuing.


That and you hopefully have a nice working default for development.


ahh ok - sounds good then! bravo and carry on!


I'm sure there will be plenty of folks raging against it, but I for one am glad to see the addition.


Coupled with that, I would love to see Passenger support background workers with the same lifecycle as front-end workers (but last time I suggested that, it wasn't planned at all if I remember well).


We implemented something like this at the place I work.

We have a tiny ruby process, based on event machine, that subscribes to various queues (we happen to use RabbitMQ). When a message arrives, the process makes a request to the passenger instance passing along the message data and waits for a response. The process limits the number of requests it makes to prevent background requests from blocking out front-end requests (for example, 20% of passenger_max_pool_size). We're also simulating priority by using different prefetch values for different queues (for example, 10 messages for high queue and 5 messages for low queue).


That's awesome to see. This was part of tenderlove's keynote.


separating API from actual implementation is always a good thing.

Like in Java, JMS API has many implementations


This strikes me as something that should be decoupled from rails.


Maintaining interoperability between plugins (gems etc) is a perpetual headache.

In some ways, I'm surprised this hasn't been there all along..

On the other hand I'm a little surprised something this simple is being celebrated as a big deal.

It's nice to see rails continue to evolve, time will tell how much it ends up looking compared to the over-arching frameworks it was out to under-do.


...why? This isn't supposed to be the one true rails queueing implementation, just a standard interface to code against. Without it third party libraries have to resort to all sorts of ugly workarounds to push work into the background ... or just not provide that feature. With this in place third party tools can push into the queue without knowing or caring what specific kind of queue you're using.


Rails is literally unstoppable


Great news. A built-in background job queue should reduce the rails learning curve - simpler to use a default option than research and test the various custom options that are available now.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: