Hacker News new | comments | show | ask | jobs | submit login
Show HN: Sneakers – Fast background processing for Ruby (sneakers.io)
114 points by jondot 1354 days ago | hide | past | web | 37 comments | favorite



Author here - Hi! :)

Sneakers is a high-performance background-job processing framework based on RabbitMQ.

It uses a hybrid process-thread model where many processes are spawned (like Unicorn) and many threads are used per process (like Puma), so all your cores max out and you have best of both worlds.

It's being used in production for I/O intensive jobs as well as CPU intensive jobs.

On a recent 2012 MBP it reaches 7000req/s for a silly microbenchmark while Sidekiq keeps at the hundreds (600-700req/s).


Have you considered pluggable backends (with a Sidekiq/Resque/Rediss backend) so you can easily upgrade a Sidekiq stack to Sneakers?


Yes. I dislike reinventing the wheel, I actually bumped my head against making Sidekiq use a different backend.

Three problems - one - it is too tightly coupled to Redis. Secondly, Celluloid (the actor model behind Sidekiq) proved to be some of the problem here. Lastly, I wanted both a process and thread processing model which doesn't exist there.

Then, I tried rigging Celluloid to perform all these but after failing and speaking with Celluloid commiters I understood that it is under "major revision" and that they are not happy with some of the core elements right now (threadpool).

THEN, I opted to reinvent the wheel. Even here, I've chosen serverengine as the basic process management infrastructure to work on - a production tested core, to build Sneakers on.


How hard would it be to use a PostgreSQL backend with sneakers?


Not hard because most things are abstracted (though - there's a trade off between too much abstraction and peformance).

But I feel it might be missing the point, RabbitMQ was used to remove the typical bottleneck from the broker which exists on most Ruby background frameworks, and using Postgres would bring it back. All in all, RabbitMQ will give you a good, transparent HA story with active-active which is useful when you must not lose messages (Postgres will give you just failover).

Edit: to cover @artellectual's response - I think OP meant to swap RabbitMQ with Postgres as the backend but if OP meant "is it possible to use existing Rails models / etc" - then yes, @artellectual is completely right - easy to do.


I meant using PostgreSQL as the backend.

Benefits:

* when you back up your main data store, your jobs are also backed up. If you use sidekiq, are you remembering to do frequent backups of redis?

* only one data store, not multiple. Simpler architecture.

* jobs can be inserted/updated in the same transactions as the rest of your data

* PostgreSQL supports listen/notify, don't have to poll for new jobs

* can use SQL to query jobs

* jobs can have foreign key constraints to the rest of your data


It doesn't hurt to keep a higher-level job representation in DB and sync the state at each step of the processing. I don't know about sneakers but you might want to have more states than the usual new/processing/success|fail.

Obviously it really depends on what you're doing, if the jobs are re-entrant, if they're the fire-and-forget kind of types, ... Still in most of the cases I would just store the jobs as a table entry and pass the table+id to the job processor. Even if PostgreSQL was the primary data store for the job processor there would still be issues with external state that needs to be healed when restoring from backup.


I understand. You're completely right about Sidekiq.

For RabbitMQ - you can have jobs persisted to disk on all nodes in the cluster which is like a backup. And of course transactional messaging is a problem that needs special attention.


I don't see why you can't just include ActiveRecord or datamapper and connect to the same DB as your app (assuming you want to allow your workers to have access to the application database) you might have to copy the models over though. But then if you want to keep it simple you might not need to copy your models over, since models for workers are usually much simpler than the application's model.

This is of course talking about the stand-alone version. And I am talking of course about using it in tandem with the rabbitmq for job storage.


I have been using Sidekiq in production for over a year now, running over 11B jobs over that year, peaking around 20m jobs per day.

I've been wanting to migrate away from it to use more of a micro-service approach, so the code doesn't live inside the monolithic rails application.

Bunny looked like a start, but I really wanted something more, this seems like the answer, will definitely be using it for the migration.

@jundot, great job! looks like a solid project!


11B == 1B, sorry


Thanks man :)


Look forward to checking this out. The biggest win regarding Sidekiq for me wasn't so much performance (which is was definitely better than Resque) but it's retry logic, error reporting, batch handling (granted, that's paid), ease of plugging into an existing app's security model, and more.

Without having looked at it yet, how does the pre-built UI compare? From an initial glance, it looks like anything that's missing is available via the DSL, but Sidekiq gave me so much of what I needed out of the box (having come from a custom-made solution using SQS + open source CFML) it made my head explode.


I can definitely say that RabbitMQ (and the awesome bunny ruby library) is doing the heavy lifting.

Batch handling - is actually "prefetch" in RabbitMQ/AMQP. Error reporting - via logging, and a "dead letter mailbox" (called a dead-letter-exchange in RabbitMQ) which is a great enterprise integration pattern for properly handling errors and retries in jobs.

RabbitMQ does the heavy lifting in another surprising facet - the UI. The management UI is excellent and at some point in time I decided not to compete with it by creating my own UI (you have to pick your fights :).

Here's some more info about the management plugin for RabbitMQ.

http://www.rabbitmq.com/management.html


The thing about Sidekiq is the management is closer to my app, but I'd of course need to look into Rabbit's UI before making any decisions.

I suspect that if this project takes off someone will put together a front end that mirrors Sidekiq's in short order.


Sidekiq's retry and robust handling of jobs (at least with reliable queuing in pro) was the big win indeed. I had a app I tried switching over to it though and it did not go well. In my case I'm running background jobs that take 4-5 minutes to complete and are pretty cpu intensive, and I suspect because sidekiq is threaded it's only using one CPU core and completely thrashing the machine. I can run about twice as many workers with resque as I can with sidekiq, so I had to switch back. Sadly now I've lost the reliable queuing and automatic retries (the latter at least is not too hard to implement in an ensure block).

I do wonder how this one would work out though, will have to give it a try at some point.


More or less the same. Plus it will use all cores on MRI, so you can keep using gems with C-Extensions (i.e. you don't have to run rbx or JRuby to max all cores).

Discussing reliability - this is something that sadly Sidekiq will never give you, by virtue of the fact that it uses Redis. RabbitMQ can be clustered in active-active mode, which means you have won over reliability here by just using a cluster.

When comparing queue systems, comparing Sidekiq+Redis to RabbitMQ is a bit unfair - because RabbitMQ was born to do this. And that's why if you're doing proper background jobs and messaging it's better to pick the right tool.

That being said, I do keep using Sidekiq for small Rails apps for the typical background emailers, denormalizers, etc. But I keep an eye open for when I realize that I'm doing proper messaging - in which case I'll switch over to something like Sneakers.


I've got to say that the more I hear "Redis can never be reliable" the more I cringe. It just seems like one of those things that's been said and repeated without people stopping to fact-check along the way.

Redis Clustering tutorial: http://redis.io/topics/cluster-tutorial

Redis persistence (using AOF or RDB or both): http://redis.io/topics/persistence


Right now Redis cannot be clustered production-ready. I wish. As I stated in the Wiki, you'll have to pry Redis from my dead body, I am very happy with it, and for me its a true swiss army knife and I've used it as such.

Even though it doesn't have clustering - it's rock solid in production and I haven't experienced a drop in one of my Redis servers in around 3 years.

That being said, if you are building a system where reliability is an explicit requirement you can't take those risks.


Still in beta/alpha as far as I last checked (we've been waiting for it). Additionally as far as clustering is concerned there still seem to be a lot of 'unknowns' for me with clustering Redis. Using a cluster in RabbitMQ is dead simple and it just works.


you might want to do some fact checking yourself before you cringe.

i would start here: http://aphyr.com/tags/Redis


On a long-running job, I can see that being an issue.

However, what was your concurrency level set at? I know when I've set it to one, I've had good success with the most demanding tasks. You essentially lose the threaded benefit, but keep the other benefits.


What Ruby version/impl are you using? Native threads in a non-GIL implementation should use all available cores.


Resque maintainer here: in recent versions we do the RPOPLPUSH stuff, so 'reliable queueing' should be there.

At least, as reliable as you can get with Redis...


This looks great. I'm really happy to see more frameworks being built on top of RabbitMQ. AMQP gets some deserved heat for its design-by-committee nature but RabbitMQ makes something really good of it.

I've used RabbitMQ for years in production, for the last two years with my Ruby background processing system Woodhouse[1]. One of the nice things I got out of using RabbitMQ was the ability to expose job arguments as AMQP headers and then to use headers exchanges to segment queues based on that. This makes it a lot easier to allocate extra resources for high-priority jobs without having to explicitly create new priority queues.

For the author: are the issues you had with Celluloid mostly due to your requirement to run on MRI? For a while I was maintaining a serviceable monkeypatch for Celluloid on MRI, but I eventually stopped needing it. It does unfortunately seem to be a bit of a moving target.

[1]: https://github.com/mboeh/woodhouse


djur - thanks :)

Yes, you nailed it. For MRI I had a bit of a different challenge. I already solved this problem a year and a half ago, and with the benefit of being able to use JRuby performance was a bit easier to reach (by dropping to "bare" Java amqp driver and Executors) - https://github.com/jondot/frenzy_bunnies


Kudos to the OP for a thorough Wiki/documentation

As always, the first question on my mind is "Why X, instead of A,B,C?" (sidekiq in this case). The OP's page is here:

https://github.com/jondot/sneakers/wiki/Why-i-built-it


Thanks danso! I'm happy that you find it useful. I also hope to integrate conclusions from relevant discussion here back into the Wiki.


Looks totally sweet.

The "auto-scaling" is still manually controlled, right? (Dynamic scaling, on-the-fly scaling?) Or does Sneakers actually change the number of processes/threads by itself depending on load?

For the less ops-savvy among us, what are some good heuristics for deciding on the balance between processes and threads?


tdumitrescu "auto-scaling" is exactly what Unicorn gives you. You can scale up or down a running pack of worker processes by sending signals (kill -USRX) to the supervisor.

The sad news is, I've gotten some feedback that I believe may be true - self-daemonizing processes is a bad practice and that we should let the OS handle daemonization. And this kind of autoscaling is a bad practice in of itself because of it. This is why I've started to deprecate this feature (passively by just including a notice for now).

The question between number of processes and number of threads is excellent. It is mostly based on the workload - and the good news is that it's all scientific. You first need to understand the peak job run time (always try to upper-bound your jobs with timeouts) which can be had by some trial runs.

If it takes 200ms per job (I/O bound), it means each thread can do 5 units of work per second. If you need 1000 jobs/sec - you need around 200 at worst of these little guys to do work. Now, you can divide those into 4 processes on a dual-core machine (2 per core is a good rule of thumb). You end up with 50 threads per worker which is pretty relaxed.

The punchline is - if you need 1000req/s - with Sneakers the question isn't "can the broker support 1000req/s" anymore, because RabbitMQ should virtually look down and laugh at those numbers :).


For a moment, was confused, but _why's Ruby project is called "Shoes".


Sorry about that :)


I'm a big sidekiq fan but you've got my interest piqued ... aside from the nice approach to retries/failures, the best thing about sidekiq for me is ease of production setup/deployment. Redis is easy to install and sidekiq has nice capistrano tasks. They're all very easy to monitor.

I've never used RabbitMQ–is it easy to setup on, say, an Ubuntu 12.04 vps? How do you restart sneakers gracefully when deploying? How do you monitor rabbit/workers? (this is probably most important)

Thanks for this project–I'm looking forward to trying it out.


I have more of a django background, so I've never used any of the libraries that you compare this to. How does Sneakers compare to something like Celery (http://www.celeryproject.org/)? Does it let you kick off async jobs and get results back, or is it just about throwing messages over the wall and letting workers process them?


My biggest obstacle with Sidekiq is communicating back to the frontend client when jobs are done (i.e. credit card processed, FacebookGraph friends cached, etc). Right now I use Pusher (websockets) with a polling fallback, but it's clunky to develop and who likes polling. Does this solution address that at all? If not, what would you do?


Wow, just finished deploying a rails app that needs a lot of scrapers with sidekiq and it was a great process but sneakers looks very nice!

Is batches of job on the roadmap? something like

Batch1: FirstWorker when done succesfuly > MySecondWorker x 30 in parrallels > ClosingWorker

Once the ClosingWorker is finished the batch is complete


Resque maintainer here: this looks really great! Congrats!




Applications are open for YC Winter 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: