

PinLater: An asynchronous job execution system - jwegan
http://engineering.pinterest.com/post/91288882494/pinlater-an-asynchronous-job-execution-system

======
ritwikt
Would be nice to know why RabbitMQ/ZeroMQ didn't fit the bill [scale/speed?]
-- I can somewhat understand the not relying on SQS but then there is anyways
a reliance on EC2 right ?

~~~
bjt
I thought the same thing about Rabbit. PinLater's ACK system feels like a
questionable reinvention of AMQP. I'm probably missing something.

Rabbit's advertised throughput doesn't reach 100,000/sec, but you should be
able to shard your way around that. Or use Apache Kafka, which has the
sharding built in.

~~~
bigtones
Looks like they needed to have built-in support for time based scheduled
execution of jobs at a specific time in the future. Rabbit MQ does not have
that (that I'm aware of) without doing some horrible hacks around one queue
per message.

~~~
kondro
ActiveMQ's had that for a couple of versions though. Another highly
performant/available, atomic, consistent, durable and mature queueing server.

------
koblas
I would be interested if anybody could do a comparison between that PinLater
and Celery. Feels like there is a bit of NIH involved, unless it's the
100K/sec rate is not something that celery could realize and PinLater can.

~~~
jokull
I have yet to dig deeper, but the fact that it uses Thrift and already has
worker implementations in C++ and Java is a huge plus. Celery is a Python
lock-in and makes it non-trivial to split your enqueue code and worker code.
Celery on the other hand, since it is Python all the way down, let’s you
orchestrate the execution of dependant tasks. In my experience these features
contribute to a more complex codebase that is harder to maintain, test and
understand.

Celery has flexibility for message backends, but prefers RabbitMQ which comes
with its own operations overhead, and does not have real horizontal scaling
built in (it has failover techniques). RabbitMQ is great for mission critical
messaging that survive system restarts.

My biggest worry is relying on Redis or MySQL. What happens if workers stop
consuming tasks, wouldn’t that balloon the redis memory consumption pretty
quickly? I suppose memory consumption is a factor of load and task bytesize.
MySQL on the other hand feels like the wrong tool for this job.

~~~
asksol
Celery can very well handle 100k tasks/sec. Celery is not locked in to Python,
but you are right it's not a drop in solution for other languages as it
doesn't support them natively (yet, but that will change soon).

You can write simple workers for Celery in other languages too, and there is
one in active development for Java. But then Redis is not really a good
choice, AMQP is more convenient for interoperability.

Have no idea if pinterest evaluated Celery, because they have not been in
contact. If they did there are several pitfalls they could have gotten wrong
at this scale, but I'm pretty confident it would have been more than suitable.
It could have been a benefit for all of us and chances are they have
underestimated the effort needed to maintain a custom solution.

------
omrvp
Thanks all for the comments about the PinLater post - great to see this
discussion! Wanted to clarify a few things:

* Yes, we did evaluate a few open source solutions before deciding to build in-house. We couldn't at the time find any solution that met all the requirements that we had: scale, scheduling jobs ahead of time, transactional ACKs, priority queues, configurable retry policies, great manageability (e.g. ability to rate limit a queue online, inspect running jobs, review failed job stack traces). [update: also support for clients/jobs in multiple languages: Python, Java, Go, C++]

* Note that we were building a v2 system. Not improving substantially on the existing system on multiple dimensions would not quite make it worth the cost of doing large scale migration of tasks. We _had_ to meet all the requirements, could not compromise on missing a few.

* Sorry I didn't mention explicitly in the post, but we _do_ plan to open source PinLater, hopefully later this year.

Happy to answer any other questions or specifics!

------
joshribakoff
Gearman is a very simple queue that is even easier than RabbitMQ (in my
opinion). You create worker threads, and submit jobs to the server. The server
sends the job to a worker thread, handles re-queuing it if it fails, etc..

Basically these guys 'invented' the queue.

Also, the problem of acknowledging success without blocking can be solved
without queues - by programming in async environments like NodeJS

~~~
tokenizerrr
> handles re-queuing it if it fails

kind of. It only re-queues if you drop the socket, I believe. Just sending an
error status will not have the message re-queued sadly. I've looked into this,
but everyone seems to be doing that manually.

------
fasteo
Sounds like a perfect fit for beanstalkd[1]. It has scheduled jobs, explicit
ACKs, priority and some other nice features (time-to-run, write ahead logging,
job burying/kicking, queue pausing).

We have been using for years to process 1+ million/jobs per day without a
single issue.

[1] [http://kr.github.io/beanstalkd/](http://kr.github.io/beanstalkd/)

------
cyounkins
The ZeroMQ position on why brokers like this may not be a good idea:
[http://zeromq.org/whitepapers:brokerless](http://zeromq.org/whitepapers:brokerless)

~~~
mtdewcmu
I was waiting for the big reveal of the solution, but it was this:

"While traditional messaging systems tend to use one of the models described
above ("broker" model in most cases) ØMQ is more of a framework that allows
you to use any of the models or even combine different models to get the
optimal performance/functionality/reliability ratio."

------
zackkitzmiller
Is this open source?

~~~
moschlar
It doesn't look that way - it's not mentioned in the blog post neither on
Pinterest's Github page.

Therefore, I don't really understand the purpose of this blog entry. Yay, they
built something. What's the use, if it's not published and usable for others,
too?

~~~
jwegan
The plan is to work on open sourcing it later this year.

------
dmritard96
at my old job, we were using qless and storm. as much or as little guaranteed
processing as you want

