
Rendering 16,000 Schematics in the Cloud with RabbitMQ and PhantomJS - compumike
https://www.circuitlab.com/blog/2012/06/20/rendering-16_000-schematics-in-the-cloud-with-rabbitmq-and/
======
pjscott
Work queues are a mostly solved problem, unless you've got unusual needs, like
preposterous amounts of throughput. These guys don't have unusual needs, and
they've got a paltry 16k tasks to keep track of. For that kind of job, _use a
library._ You don't want to be the one who writes code to deal with restarting
hung workers, or shunting pathological jobs to a retry-later queue, or
changing pipeline topology live in production without a hiccup. I've written
this stuff -- we had unusual needs -- so take it from me: you don't want to
write this stuff.

If you do most of your work in Python, Celery is pretty slick:
<http://celeryproject.org/>

If you like Redis -- and who doesn't? -- then Resque or one of the alternate
implementations will work well in Ruby or a plethora of other languages:

<https://github.com/defunkt/resque>

[https://github.com/defunkt/resque/wiki/Alternate-
Implementat...](https://github.com/defunkt/resque/wiki/Alternate-
Implementations)

Beanstalkd is pretty generic, and simple to set up. If for some reason you
can't use Celery or Resque, maybe have a look at it:

<http://kr.github.com/beanstalkd/>

Distributed task queueing is important and useful stuff, and if you half-ass
it, your infrastructure will literally explode, usually at 3:00 AM. Get a good
library for it.

~~~
memset
Can you clarify what you mean by "use a library?"

It's easy to follow the rabbitmq tutorial and build a nice little pipeline for
tasks - what is the difference between that and using a library?

~~~
pjscott
The key question is, "What happens if $INSERT_CALAMITY_HERE?"

What happens if a worker crashes? What happens if a few jobs cause the workers
to work forever? What happens if a job hits a bug in your worker code and
causes it to reliably crash every time? What happens if your queueing server
goes down? What happens if you need to migrate your queueing server to another
EC2 instance because Amazon has decided to get rid of the physical machine
it's running on?

Undesirable answers to such questions include "We accidentally drop a job
without noticing," or "We have to shut down the whole system in a user-visible
way for twenty minutes while we do maintenance," or "All the workers keep
crashing and being restarted, flapping eternally until someone notices and
manually intervenes."

EDIT: Oops, I just noticed that I didn't really answer your question about
what I meant by using a library. What I mean is, use one of the more full-
featured job queueing libraries like Celery or Resque, rather than trying to
roll your own on top of something like RabbitMQ.

~~~
memset
Ah, okay, so something like Celery would handle all of those kinds of cases
for you. (Though... doesn't rabbitmq have many of those features built-in?
Queue server replication and redundancy and stuff? Though I suppose a
dedicated project would handle more of those calamities than I could think of
on my own.)

Thanks for that!

~~~
Ixiaus
RabbitMQ doesn't have all that built in at all. RabbitMQ is only an AMQP
message broker/exchange/queue agent. It's hard to articulate (and probably
should be done in a blog post) all the things that can go wrong, but rest
assured, there are a lot of things that can go wrong with RabbitMQ AND with
your worker nodes that are built on-top of RabbitMQ.

Distributed workloads is a hard problem, but it is a solved problem (for most
applications, there are some exceptions thought too).

------
colanderman
_All in all, creating our professional-quality schematic render outputs in a
variety of formats is a CPU-intensive process, typically requiring roughly 10+
seconds of a single core of modern CPU time per schematic._

Wait… really? A schematic editor can render a schematic at least 30 times per
second. "Rendering" to PDF should be even faster since PDF is a vector format.
Are they rendering in hundreds of different formats?

~~~
compumike
So you can load this from a schematic format, parse it, get all the element
symbols loaded in the right places, grab and render the fonts and place them
appropriately, and construct a well-formed output like
[https://www.nerdkits.com/media/circuitlab/20120620-democircu...](https://www.nerdkits.com/media/circuitlab/20120620-democircuit.pdf)
, all in 1/30th of a core-second? With a bunch of disk seeks in there too?

Of course, we could devote engineering effort to bring that 10 seconds down by
an order of magnitude, writing a fully custom PDF (and EPS, SVG, PNG, ...)
rendering engine, but part of the beauty of elastic cloud computing is that we
can make better use of existing components -- even if it's less efficient from
a CPU time perspective -- because the CPU time is cheap and programmer time is
expensive (and better used elsewhere). There's a balance, of course, but
making well-informed tradeoffs is what engineering is all about.

~~~
revelation
Uh.. yes? Obviously you should have all the elements, symbols, fonts already
cached in memory. Its not like you have thousands of these, anyway.

Layouting was already done by the user, so I don't see what part of rendering
is incurring the heavy costs here. As it stands, you have now spent lots and
lots of programmer time building a batch system from scratch.

~~~
compumike
Premature optimization can be an expensive mistake to make... It may feel
"wrong" to use 10X or even 100X CPU cycles when you "know" that you could do
it in X, but if the 100X cycle solution still fits the business requirements
of the overall system, and takes a lot less programmer time, which path would
you choose?

~~~
revelation
If, as a progammer, I see something like this render taking 10 seconds, I will
shake my head in confusion and think "what the hell is taking up all this
time". Stuff like loading the same font a hundred times, allocating memory for
all the same symbols - identifying and getting rid of that is not premature
optimization, its plain fixing the code.

Of course you might have done all of that - no way to know unless you tell us
more about the actual pipeline.

------
Ixiaus
What does having Erlang experience provide for you when using RabbitMQ? Unless
your rendering nodes were written in Erlang... In which case you could have
easily just used Erlang's built in node connection primitives and used RPC's
instead of the much more complicated AMQP process.

I know this from personal experience. If you have a consumer based distributed
worker pool, AMQP is overkill! My application is built in Erlang and the
worker nodes are also built in Erlang - I initially went with using RabbitMQ
to hand work out to the workers (and to receive processed results back from
workers) but I quickly found it added a lot of message latency (vs. a straight
RPC call) and I also had to write _a lot_ of management code.

Switching over to straight RPC was a great decision and I essentially kept the
"worker queue" logic in our Python application (where it is a "solved"
problem).

Big topic, for sure - I should write a blog post on how my application is
structured.

------
beambot
Wow... CircuitLab is like a modern, web-based Spice. I love this idea. I will
be using your site to quickly generate images for papers and presentations,
and I will certainly recommend it to students. Awesome!

(Are you using Spice as the underlying simulator?)

~~~
compumike
Thanks! Please do pass it around :)

We actually have a custom simulator engine running client-side in Javascript.

------
godrik
Nice! :D

