Hacker Newsnew | comments | leaders | jobs | submitlogin
Innovating Cron: Announcing Norc (perpetually.com)
47 points by jeremymims 98 days ago | 30 comments


6 points by Hoff 98 days ago | link

Apple has supplanted cron with launchd within Mac OS X and Mac OS X Server:

http://developer.apple.com/macosx/launchd.html

-----

4 points by wtallis 98 days ago | link

More importantly, http://launchd.macosforge.org/

-----

4 points by blasdel 98 days ago | link

It also replaces /bin/init, init scripts, xinetd scripts, and more! Almost everything that executes something automatically is now done through launchd (except LoginWindow still does its own instead of providing input to launchd).

The only annoyance is Apple's worst-of-both-worlds XML plist format, which replaced something fairly close to JSON with an awful pair-wise angle-bracket shit-pile.

-----

6 points by timf 98 days ago | link

Celery supports scheduled tasks (including recurring ones like cron): http://ask.github.com/celery/introduction.html

It was originally written in the context of Django but it is a general purpose Python library now.

-----

2 points by darrellsilver 98 days ago | link

I've heard some great things about Celery, and I think it may be better at handling massive amounts of tasks (10s of thousands) without using SQS like Norc.

How does it handle logs, resources and changing trees?

Gonna have to look into it more!

-----

4 points by asksol 98 days ago | link

Not sure what you mean by logs, resources and changing trees. But there is logging support (the python logging module).

Resources: There is AMQP QoS which makes sure it only receives as many tasks as it can handle.

Task hard and soft time-limits is coming in 1.0 (patch ready). If the soft timeout is exceeded an exception is raised which the task can catch to do any clean up before the hard time limit is exceeded and the task is forcefully killed.

Rate limit (per task type or global) using the token bucket algorithm (which allows for bursts of data). For 1.0 (patch ready and tested)

Otherwise you have to OS process resource limits (cpu/memory etc).

Monitoring is coming in 1.0 as well, someone is working on a monitoring system with a web-frontend where you can see the current state of the system (support for deleting already published tasks might be added, but then on an opt-in basis)

The current scheduling system is flawed (it uses the database, which is a dead end in my opinion), a new solution is almost ready which uses a separate centralized service that works like a clock sending out messages at schedule time: http://wiki.github.com/ask/celery/rewriting-the-periodic-tas...

Now for changing trees, I'm not sure what you mean here, please correct me if I misunderstood. Messages can not be changed once they have been published, so the task itself is responsible for changing the execution order. You can chain tasks, so say TaskA launches another task. You can retry tasks if they fail.

Oh and there's the message routing features made available by AMQP, which means you can have different servers/instances handle different tasks.

Celery has a lot of features, and even more is under development, so I don't think I can list them all here. I can't see anything hindering a Celery implementation of Norc, but as I read it you started working on this before celery started. Bad luck when we could have shared a lot of work :(

-----

1 point by darrellsilver 98 days ago | link

That sounds super interesting! This sentence raises a question:

>You can chain tasks, so say TaskA launches another task. You can retry tasks if they fail.

How does this chaining work? Does each Task define its children? I've found that its cleaner to separate tasks from their place in the tree. For example, a script that downloads a CSV file each hour shouldn't care how that file is used.

-----

2 points by asksol 98 days ago | link

There's nothing built in to celery for this, but some people have already made different solutions (one which may be added as standard in the future). There's a ChainableTask where you can chain tasks together, the next running when the previous is finished, and they can optionally take the result of the previous task as a parameter.

-----

2 points by timf 98 days ago | link

Instead of SQS it uses RabbitMQ. I don't know about it's limits but since a serious RabbitMQ installation can handle something like millions of messages per second I assume the limits you will hit are in the persistence solution on celeryd. It can use a Tokyo Tyrant or MongoDB backend instead of a RDBMS, as well as memcached support. Those seem like they would help in that department.

I don't think you can change trees after they are sent (you mean subtasks, right?).

You can see logging example at "defining and executing tasks" section here: http://ask.github.com/celery/introduction.html#usage

Sorry, I am only a light user at this point in time.

-----

1 point by asksol 97 days ago | link

Just a small note: You can also use CELERY_BACKEND="amqp" to send back the result as a message, it's the most efficient way, but then you can only look up the result once (unless you send another message with the same result).

-----

1 point by vegai 98 days ago | link

http://www.rabbitmq.com/faq.html#performance

"From our testing, we expect easily-achievable throughputs of 4000 persistent, non-transacted one-kilobyte messages per second (Intel Pentium D, 2.8GHz, dual core, gigabit ethernet) from a single RabbitMQ broker node writing to a single spindle."

Or what do you mean by serious?

-----

1 point by timf 97 days ago | link

I was thinking more about multiple brokers working together.

Also, hmm, I didn't think rabbit was so significantly behind other AMQP implementations like zeromq: "4,100,000 messages a second" - http://www.zeromq.org/

-----

1 point by darrellsilver 98 days ago | link

I've been hearing really good things about RabbitMQ, and haven't been particularly impressed with SQS. It works at scale (we're currently at 10s of thousands a day, but it's probably the same performance at 100 or 1000x that...) but tasks take many seconds (like 10-30) to show up in the queue, which absolutely kills us.

We'll be adding a RabbitMQ plugin to Norc, just like SQS, when we can.

-----

3 points by pinko 98 days ago | link

While cron is clearly pushed way past its capabilities by many people (without always realizing it), this strikes me as a reinvention of something that's been around forever: the batch system.

If the goals are improved reliability, management, fault-tolerance, etc., you'd be better off using mature software that was designed to solve this problem, and already has good community support, like Condor.

-----

2 points by darrellsilver 98 days ago | link

Comparing Norc to batch is similar to comparing it to cron. Many of the limitations of batch processing come with retrying or auditing complex trees of dependencies (bigger than a handful of tasks).

It's like wrapper scripts. They work well until they don't: managing the logs, getting efficient execution, understanding what happens when all become quite unwieldy, whereas a Norc-like approach proves more self-documenting and easy to manage.

I believe resource management is limited to overall system load. Batch isn't designed for things like managing available licenses.

I'm not a huge expert at batch, so if I've missed something let me know.

-----

3 points by noste 98 days ago | link

Comparing Norc to batch is similar to comparing it to cron. Many of the limitations of batch processing come with retrying or auditing complex trees of dependencies (bigger than a handful of tasks).

Condor has DAGMan (http://www.cs.wisc.edu/condor/dagman/) for managing DAGs of jobs, and I suspect that it can scale higher than just a handful of tasks.

I believe resource management is limited to overall system load. Batch isn't designed for things like managing available licenses.

Condor acts as match maker for jobs and cluster nodes using "classified advertisements" (http://www.cs.wisc.edu/condor/classad/). Classads allow you to describe the jobs and the cluster nodes, and to express arbitrary requirements and preferences for both. This allows jobs to say "I need a node with X", but it also allows nodes to express requirements (e.g. "I won't any run jobs coming from the psych department"). I don't see why you couldn't use this system for expressing requirements regarding licenses.

However, Condor appears to be very complex. The PDF version of the Condor 7.3 manual has 991 pages in it. OTOH, Red Hat MRG Grid is based on Condor (http://www.redhat.com/mrg/grid/), so commercial support should be available.

-----

1 point by timf 98 days ago | link

> don't see why you couldn't use this system for expressing requirements regarding licenses

People do just that: http://www.cs.wisc.edu/condor/techpaper/licenses.html

I would not say it is "complex" but ultra flexible. You will not have things working in just one day or anything, but I've set it up without a significant hassle. People have had a lot of ongoing success with Condor in large (and very large) computing environments.

-----

1 point by cabacon 98 days ago | link

I'd agree that "length of manual" is a poor heuristic for complexity. Condor is super-customizable, and they document those options pretty extensively. To the direct point, they've even started highlighting the use of condor as "condor cron" in the Open Science Grid to run monitoring applications: http://vdt.cs.wisc.edu/releases/1.10.1/notes/Condor-Cron.htm...

Without that, I'd say it would be a slightly masochistic exercise to take a big system like condor and try to make it a cron replacement; while it clearly has that capability, if that's not one of its use cases, it's going to be a real bear to force it into that configuration. I've been a fringe user of condor before, and it can result in some unfriendly "rejected your job for unknown reasons" situations.

-----

3 points by jcapote 98 days ago | link

"AutoSys is a closed, proprietary system, and SQS / RabbitMQ are queueing systems, without support for scheduling."

Right, why would a message queue support scheduling?

-----

4 points by darrellsilver 98 days ago | link

Norc isn't meant as a replacement for any queuing system, rather a replacement for Cron.

We use Norc in conjunction with SQS, and find the former is good for scheduling, management, logging, while SQS is better for lots and lots of repetitive tasks that we want processed ASAP.

Perpetually.com crawls, archives and versions any web site on demand, with any repeating schedule. One of the challenges is managing bursts of activity, so what we do is use Norc to manage the timing of archive requests, then SQS to manage farming out the actual archiving to available hosts. Since we hook SQS into Norc we can easily monitor/audit any system delay or outage.

We also use it to handle system backups, sending alerts and other system administration tasks.

-----

1 point by hapless 98 days ago | link

So it's a hybrid between a batch job processor and cron? Batch cron?

-----

2 points by sophacles 98 days ago | link

The answer is in the article:

While cron is great, it’s not geared toward solving this problem: Tasks are tied to a single computer, and they’re managed independently for each host and user from the command line.

-----

2 points by NikkiA 98 days ago | link

fcron has always been my 'go to' for a 'better cron' (http://fcron.free.fr/) and thankfully it has been available as an 'alternative' to cron in most distros for a while now.

That said, norc appears interesting, and I'll certainly be keeping an eye on it (I doubt I'll switch until it's a bit more established - and available in my favourite distros :)

-----

1 point by djb_hackernews 98 days ago | link

I know I'm abusing github.com, but does anyone get a redirection loop error in firefox when trying to view http://github.com/darrellsilver/norc/blob/master/core/models...?

-----

1 point by DannoHung 98 days ago | link

Is it possible to start a task inside a job and have it's dependencies start up after it finishes without starting the whole job? Like, say I have a sequence of operations to one, in the middle, something screws up (like a network connection is down for some reason). Can I rerun from the step that screwed up and get the rest to follow through?

-----

3 points by darrellsilver 98 days ago | link

If I understand the question the answer is yes. I think you're asking if you can retry portions of a job, like if a task breaks but doesn't exit with an error, thus causing the rest of the job to continue.

In this case, yes, you can retry a task in the middle of a tree of tasks, and retry all the children tasks as well. This will leave some tasks untouched (the parents and peers) and retry the children in the same order as the first time.

If a task in the middle of the tree fails children tasks will not run until it's been skipped or successfully run.

Retrying children tasks would be a great job for a GUI for Norc because then you could choose which children tasks to retry interactively from a web app or somesuch...

-----

1 point by DannoHung 98 days ago | link

Yep. That's what I wanted to know. Sorry for the poor phrasing.

I was little confused because it seemed like norc would only start jobs at the top level.

-----

1 point by darrellsilver 98 days ago | link

Yeah, the terminology isn't well-defined.

Jobs start with any tasks that return True when asked due_to_run(), which can be schedule, parents' status, etc.

Glad I could clear it up!

-----

1 point by silentbicycle 98 days ago | link

Provided the events are run locally (not distributed over multiple servers), this could be done pretty easily by running make targets out of cron.

-----

1 point by amcfague 97 days ago | link

Darrell,

I know this is a bad place to ask for it, but we actually have just started a story at my company to look for a better scheduling tool--mostly, just a distributed cron. Do you have any preference for Q&A? Should we use HN?

If not, feel free to send me an email so I could bounce some questions off you (amcfague at wgen dot net)

EDIT: Considering these questions are not answered in the current README, I'd also like to be able to propose additions to the documentation as a result! :)

-----




Lists | RSS | Bookmarklet | Guidelines | FAQ | News News | Feature Requests | Y Combinator | Apply | Library

Analytics by Mixpanel