

Innovating Cron: Announcing Norc - jeremymims
http://blog.perpetually.com/post/230779975/innovating-cron-announcing-norc

======
timf
Celery supports scheduled tasks (including recurring ones like cron):
<http://ask.github.com/celery/introduction.html>

It was originally written in the context of Django but it is a general purpose
Python library now.

~~~
darrellsilver
I've heard some great things about Celery, and I think it may be better at
handling massive amounts of tasks (10s of thousands) without using SQS like
Norc.

How does it handle logs, resources and changing trees?

Gonna have to look into it more!

~~~
timf
Instead of SQS it uses RabbitMQ. I don't know about it's limits but since a
serious RabbitMQ installation can handle something like millions of messages
per second I assume the limits you will hit are in the persistence solution on
celeryd. It can use a Tokyo Tyrant or MongoDB backend instead of a RDBMS, as
well as memcached support. Those seem like they would help in that department.

I don't think you can change trees after they are sent (you mean subtasks,
right?).

You can see logging example at "defining and executing tasks" section here:
<http://ask.github.com/celery/introduction.html#usage>

Sorry, I am only a light user at this point in time.

~~~
vegai
<http://www.rabbitmq.com/faq.html#performance>

"From our testing, we expect easily-achievable throughputs of 4000 persistent,
non-transacted one-kilobyte messages per second (Intel Pentium D, 2.8GHz, dual
core, gigabit ethernet) from a single RabbitMQ broker node writing to a single
spindle."

Or what do you mean by serious?

~~~
timf
I was thinking more about multiple brokers working together.

Also, hmm, I didn't think rabbit was so significantly behind other AMQP
implementations like zeromq: "4,100,000 messages a second" -
<http://www.zeromq.org/>

------
Hoff
Apple has supplanted cron with launchd within Mac OS X and Mac OS X Server:

<http://developer.apple.com/macosx/launchd.html>

~~~
blasdel
It also replaces /bin/init, init scripts, xinetd scripts, and more! Almost
everything that executes something automatically is now done through launchd
(except LoginWindow still does its own instead of providing input to launchd).

The only annoyance is Apple's worst-of-both-worlds XML plist format, which
replaced something fairly close to JSON with an awful pair-wise angle-bracket
shit-pile.

------
pinko
While cron is clearly pushed way past its capabilities by many people (without
always realizing it), this strikes me as a reinvention of something that's
been around forever: the batch system.

If the goals are improved reliability, management, fault-tolerance, etc.,
you'd be better off using mature software that was designed to solve this
problem, and already has good community support, like Condor.

~~~
darrellsilver
Comparing Norc to batch is similar to comparing it to cron. Many of the
limitations of batch processing come with retrying or auditing complex trees
of dependencies (bigger than a handful of tasks).

It's like wrapper scripts. They work well until they don't: managing the logs,
getting efficient execution, understanding what happens when all become quite
unwieldy, whereas a Norc-like approach proves more self-documenting and easy
to manage.

I believe resource management is limited to overall system load. Batch isn't
designed for things like managing available licenses.

I'm not a huge expert at batch, so if I've missed something let me know.

~~~
noste
_Comparing Norc to batch is similar to comparing it to cron. Many of the
limitations of batch processing come with retrying or auditing complex trees
of dependencies (bigger than a handful of tasks)._

Condor has DAGMan (<http://www.cs.wisc.edu/condor/dagman/>) for managing DAGs
of jobs, and I suspect that it can scale higher than just a handful of tasks.

 _I believe resource management is limited to overall system load. Batch isn't
designed for things like managing available licenses._

Condor acts as match maker for jobs and cluster nodes using "classified
advertisements" (<http://www.cs.wisc.edu/condor/classad/>). Classads allow you
to describe the jobs and the cluster nodes, and to express arbitrary
requirements and preferences for both. This allows jobs to say "I need a node
with X", but it also allows nodes to express requirements (e.g. "I won't any
run jobs coming from the psych department"). I don't see why you couldn't use
this system for expressing requirements regarding licenses.

However, Condor appears to be _very_ complex. The PDF version of the Condor
7.3 manual has 991 pages in it. OTOH, Red Hat MRG Grid is based on Condor
(<http://www.redhat.com/mrg/grid/>), so commercial support should be
available.

~~~
timf
> _don't see why you couldn't use this system for expressing requirements
> regarding licenses_

People do just that: <http://www.cs.wisc.edu/condor/techpaper/licenses.html>

I would not say it is "complex" but ultra flexible. You will not have things
working in just one day or anything, but I've set it up without a significant
hassle. People have had a lot of ongoing success with Condor in large (and
very large) computing environments.

~~~
cabacon
I'd agree that "length of manual" is a poor heuristic for complexity. Condor
is super-customizable, and they document those options pretty extensively. To
the direct point, they've even started highlighting the use of condor as
"condor cron" in the Open Science Grid to run monitoring applications:
[http://vdt.cs.wisc.edu/releases/1.10.1/notes/Condor-
Cron.htm...](http://vdt.cs.wisc.edu/releases/1.10.1/notes/Condor-Cron.html)

Without that, I'd say it would be a slightly masochistic exercise to take a
big system like condor and try to make it a cron replacement; while it clearly
has that capability, if that's not one of its use cases, it's going to be a
real bear to force it into that configuration. I've been a fringe user of
condor before, and it can result in some unfriendly "rejected your job for
unknown reasons" situations.

------
jcapote
"AutoSys is a closed, proprietary system, and SQS / RabbitMQ are queueing
systems, without support for scheduling."

Right, why would a message queue support scheduling?

~~~
darrellsilver
Norc isn't meant as a replacement for any queuing system, rather a replacement
for Cron.

We use Norc in conjunction with SQS, and find the former is good for
scheduling, management, logging, while SQS is better for lots and lots of
repetitive tasks that we want processed ASAP.

Perpetually.com crawls, archives and versions any web site on demand, with any
repeating schedule. One of the challenges is managing bursts of activity, so
what we do is use Norc to manage the timing of archive requests, then SQS to
manage farming out the actual archiving to available hosts. Since we hook SQS
into Norc we can easily monitor/audit any system delay or outage.

We also use it to handle system backups, sending alerts and other system
administration tasks.

~~~
hapless
So it's a hybrid between a batch job processor and cron? Batch cron?

------
NikkiA
fcron has always been my 'go to' for a 'better cron' (<http://fcron.free.fr/>)
and thankfully it has been available as an 'alternative' to cron in most
distros for a while now.

That said, norc appears interesting, and I'll certainly be keeping an eye on
it (I doubt I'll switch until it's a bit more established - and available in
my favourite distros :)

------
DannoHung
Is it possible to start a task inside a job and have it's dependencies start
up after it finishes without starting the whole job? Like, say I have a
sequence of operations to one, in the middle, something screws up (like a
network connection is down for some reason). Can I rerun from the step that
screwed up and get the rest to follow through?

~~~
darrellsilver
If I understand the question the answer is yes. I think you're asking if you
can retry portions of a job, like if a task breaks but doesn't exit with an
error, thus causing the rest of the job to continue.

In this case, yes, you can retry a task in the middle of a tree of tasks, and
retry all the children tasks as well. This will leave some tasks untouched
(the parents and peers) and retry the children in the same order as the first
time.

If a task in the middle of the tree fails children tasks will not run until
it's been skipped or successfully run.

Retrying children tasks would be a great job for a GUI for Norc because then
you could choose which children tasks to retry interactively from a web app or
somesuch...

~~~
DannoHung
Yep. That's what I wanted to know. Sorry for the poor phrasing.

I was little confused because it seemed like norc would only start jobs at the
top level.

~~~
darrellsilver
Yeah, the terminology isn't well-defined.

Jobs start with any tasks that return True when asked due_to_run(), which can
be schedule, parents' status, etc.

Glad I could clear it up!

------
djb_hackernews
I know I'm abusing github.com, but does anyone get a redirection loop error in
firefox when trying to view
[http://github.com/darrellsilver/norc/blob/master/core/models...](http://github.com/darrellsilver/norc/blob/master/core/models.py)?

------
amcfague
Darrell,

I know this is a bad place to ask for it, but we actually have just started a
story at my company to look for a better scheduling tool--mostly, just a
distributed cron. Do you have any preference for Q&A? Should we use HN?

If not, feel free to send me an email so I could bounce some questions off you
(amcfague at wgen dot net)

EDIT: Considering these questions are not answered in the current README, I'd
also like to be able to propose additions to the documentation as a result! :)

