

Octobot - A Low-Latency, Highly Parallel Distributed Task Queue Worker - cscotta
http://octobot.taco.cat

======
Sukotto
This is a good case-study of what to do when building a home page for your
project.

1) State what your project is about using language that a reasonably savvy
(technology-wise) user can understand, even if unfamiliar with the problem
domain. (this site would be better if they linked on "message queue" for those
people that don't know what that is.)

2) Eye-catching call to action icons

3) Brief list of most important features

4) Brief high-level view

5) Brief low-level view

6) Some clean and minimalist stats on why you might want to use it

7) Easy to understand links to more info.

I don't know if I will ever need this product... but I'm bookmarking it anyway
as an _outstanding_ example of how to introduce people to it. (Except for the
swear word. The word "Fucking" is really out of place)

Nice job guys.

~~~
superjared
"Portland Fucking Oregon" is definitely a PDX thing. "We're too cool to not
swear" or something. I dig it, but then I live here.

------
evgen
The throughput numbers on the chart would have seemed a bit more valid if the
task had not included stuffing the result into a protobuf given the
notoriously slow performance of the standard python protobuf implementation
that would have been used by Celery and PyInvoke. If they wanted to impress,
or at least be honest, the data would have stayed in JSON the whole way
through the stack and the worker task would have been a simple string
manipulation of something similar.

~~~
cscotta
Hi there,

The task benchmarked was from a component of our messaging stack at work that
I'd ported. My intention was to offer an example of the end-to-end performance
of a real-world task that someone might be writing, rather than a sample task
that's not much more than a no-op. I'd suggest that this particular one is
real-world as it's straight out of one of our applications. I've no intention
of trying to be dishonest here - just offering the measurements of my (and
our) internal evaluation of the tool for our needs.

While I can't provide the source of the task, I can offer the quick-and-dirty
source of the quick little "PyInvoker" I'd whipped up (sorry - didn't realize
there was something called PyInvoke at the time). It just takes a message,
unpacks the JSON, and uses getattr to call the appropriate task. Nothing fancy
like e-mail error notifications, retries, and the like:
<https://gist.github.com/18a30689832569d67861>

Anyhow, always and absolutely take any claims regarding the performance of a
tool with a massive grain of salt, and try them for yourself to see if they
suit your needs. Octobot's not designed to replace background processing in
most applications as tools like Celery and DelayedJob (both of which I use
myself) are great, a bit easier to write for, and a bit simpler to get up and
running, depending on the application and language being used.

There's no intention to have slighted anyone or any other project here. I'm
stoked that a lot of tools exist in this space. I just hadn't seen one on the
JVM that offered this level of simplicity and parallelism with a bit of
restraint when it comes to feature creep. But if you have an application that
demands high throughput / low latency execution of tasks, this might be worth
evaluating.

~~~
asksol
I'm interested in seing the code/config used to benchark Celery. The default
settings are not at all optimized for processing lots of small jobs, and you
could easily tweak it to get a 100x speed up for that use case, e.g.:

    
    
       CELERYD_PREFETCH_MULTIPLIER = 0
       CELERY_DISABLE_RATE_LIMITS = True
    

Also, channels are not re-used unless you explicitly pass the Publisher, so
e.g.

    
    
       publisher = task.get_publisher()
       for i in xrange(1000):
           task.apply_async(args=(i, ), publisher=publisher)
       publisher.close()
    

is known to be a _massive_ speed-up for sending tasks in batch (it seems the
creation of channels is very expensive in pyamqplib).

~~~
asksol
By the way the performance increase you're seeing with the PyInvoker (from
your gist) is most likely because it doesn't have prefetch_count enabled.

Celery enables this so a single worker doesn't suck in a million messages at a
time, and to balance the work load between available resources. As noted
previously it can be disabled.

Btw, octobot looks great, maybe we can share ideas.

~~~
cscotta
Right on, thanks Ask! I'm checking out some of this right now and might not be
able to get through it all today, but will give it a try. Just shot you a
couple messages outside of HN - love to talk when you have a chance!

------
chrisduesing
Am I the only one who got really excited upon seeing the domain was
octobot.taco.cat, and then very confused when going to www.taco.cat? It seems
to be an art gallery in Spanish or something...

I realize this is not germane to the topic, but .cat!? What other really
interesting tld's exist that I have never heard about?

~~~
listic
It really surprised me that they assigned three-letter TLD for Catalan,
whereas all other countries have two-letter TLD's.

It happens to be not a country code
([http://en.wikipedia.org/wiki/List_of_Internet_top-
level_doma...](http://en.wikipedia.org/wiki/List_of_Internet_top-
level_domains#Generic_top-level_domains)), but the generic TLD, alongside the
likes of .com and .org, "for Web sites in the Catalan language or related to
Catalan culture", the only one of this kind in this category. Go figure.

------
moe
Looks interesting at a glance.

But guys, seriously, make your docs available in HTML format. PDF-only docs
are out of fashion since the 90s.

------
gmcquillan
Fascinating. The thing I find interesting about this service is that it seems
to work seamlessly with multiple queue backends. That would have been really
useful at my company, where we completely swapped out our queue server
infrastructure. Nice work!

------
johngalt
Sorry to be the slow one here... What is a Distributed Task Queue Worker? What
problem does it solve?

~~~
listic
Yup. I'm just wrapping my head around the things like RabbitMQ and Redis and I
think I understand what those are for. But can someone explain straight: in
which case should I want to use this Octobot?

~~~
superjared
Octobot (like Celery, Resque, et al) is a worker, meaning that it takes
messages from a queue, such as Rabbit or Redis, and processes that message
based on a task that you write. Imagine Octobot being used to create
thumbnails for Flickr--a job that _should_ be done asynchronously.

~~~
johngalt
Do I have this correctly?

Between a list of actions, and the logic that runs those tasks is Octobot. So
in your thumbnail example the problem that would be solved would be something
along the lines of:

"I've got this code that can create a thumbnail from a given image and let me
know if it succeeded or failed, but how do I run this on my backlog of
10million images? I'd need something that can check my list of incoming images
and distribute the jobs over X number of computers. At the end it would be
great to know my failure rate, and for those failures not to block my ongoing
process of creating thumbnails."

So is Octobot there to provide a method of resource allocation? Or is it more
of a monitoring app that checks pass/fail of the jobs?

~~~
skorgu
To expand the image resizing example:

Most web gallery requests go like: (ignoring caching)

Browser -> PHP -> Database

When you upload an image you could simply handle it in-line in the server:

Browser -> PHP -> DB -> Resizer

That means the next page refresh is waiting on that resizer to finish which
means long page latency to the most latency-sensitive component imaginable
(the fickle user).

So you really want that resize to happen asynchronously, i.e. not wait for the
result before showing the page. The roll-your-own method is to put a row in
the DB that says "Hey I need to be resized" and have a cron job or somesuch
that does the resizing:

Browser -> PHP -> DB

Resizer -> DB

This of course puts all the load on the hardest thing to scale (the DB) so you
grow out of it fast. Hence message queues. RabbitMQ, ActiveMQ, Redis, etc are
all variants of the queue, so now you have:

Browser -> PHP -> DB -> Queue

And the Queue holds all the resizing jobs that need to be done. You could just
modify your cronjob to check the Queue instead of the DB of course.

Octobot (and Celery) is a queue runner that connects to that Queue, reads in
the jobs that need to be done and runs them. So instead of a cron job you
write your resizer in a way that your queue runner understands and the runner
will manage some of the plumbing for you.

So you have

Browser -> PHP -> DB -> Queue

Queue -> Octobot -> Resizer ( -> DB to say it's done perhaps.)

Now that you're decoupled you can add more resizers, webservers, distribute
them across multiple systems, expand to EC2 to handle overflow load, etc by
leveraging what your queue provides.

------
jnoller
Interesting; I'm looking forward to comparing/contrasting this with my current
favorite - Celery (<http://celeryproject.org/>) which supports a variety of
backends.

------
metabrew
I like that it has the capability to be an email queue build in, by supporting
SMTP/SSL. Worth installing just for that, since that's the first thing lots of
websites want a message queue for.

------
DEinspanjer
Very interesting. We're currently working on a quick project to test
integrating Hazelcast into a distributed server to use as a queue. Did you
look at it by any chance? Curious if it was missing something that you needed.

~~~
cnlwsu
From my personal experience with these, Hazelcast seems to be better suited
for something like a single worker queue... We had issues when we had
thousands of distributed queues on different systems. We ended up going with
Terracotta (which handles this well if you work through the lock contention).

This project sounds promising for that kind of application but wont know until
I try it out.

~~~
DEinspanjer
Sorry for my confusion, so is the problem you experienced one of having
thousands of queues that a small number of clients pulled from, or thousands
of workers that worked off of a small number of queues?

In our case, we want to be having a very small number of queues that have a
high rate of i/o with dozens of reader clients and writer clients.

~~~
bananaandapple
Could you eloborate where you did have problems? We also plan on using a small
number of queues having a high rate of readers and clients (e.g. distributed
task execution).

------
waratuman
Interesting, but won't ever use. It also seems that it is just wrapping
typical queueing systems, which I would use anyways because this only supports
the JVM.

~~~
superjared
This is a worker system that _uses_ queues, not a queue unto itself.

------
swah
Surprised to see it wasn't written in Scala, since Scala shows up before in
their list of supported languages and Scala code example also comes first.

------
dataguy
Sounds like a nice tool. Will definitely give it a try for high work-load
statistical data processing.

------
there
did anyone else have to take a second look at that domain name?

~~~
rubashov
tako is japanese for octopus...

~~~
there
i was more interested in the _.cat_ tld

