

We don’t run cron jobs at Nextdoor - wenbin
https://engblog.nextdoor.com/2015/06/10/we-do-not-run-cron-jobs-at-nextdoor/

======
ChuckMcM
Hats off to the NextDoor folks for building a new scheduler.

I wish they had some additional information about what they did though. The
four problems they called out:

1) Scaling

2) Crontab editing

3) Job status

4) Job restarting.

In my experience, the scaling issue is solved by having multiple machines
which can run jobs. Assuming that the job doesn't have to be tied to a
particular machine (DB backups have to be on the machine with the database of
course, but a log gathering and analyzing job can run on any machine with
network connectivity. They seem to have done that part by creating a new class
of machine called a 'taskworker'. Presumably they could distribute their
crontabs to all of those machines and have a limiter that says "don't run if
you're IP is divisible by 2 or 3 or 4 or what ever scaling factor. More on
this in a bit.

And that brings the editing question of cron tabs up, which is abhorrent with
a text editor but pretty trivial with any number of 'helper' apps.

Then 3, and 4, of status and restarting. Which I have always "wrapped" cron
jobs in a simple unified shell script that logged their start, accepted
progress outputs, and then provided status, without generating output to
stdout so as to avoid needless emails. You only get email is the wrapper
breaks, which is a big deal for ops to deal with right away.

Additionally the wrapper can work with distribution since you can provide it
'rule' based run/don't run heuristics (like if this machine has the resources
for this job yes/no, is it the 'nth' machine (if distributing across n
machines) etc.

So for me the typical cron "deployment" is a wrapper for the deployment that
encapsulates the various policies (distribution, frequency, etc) a crontab
that is compiled and validated from a more human friendly input, and then a
script that runs on the syslog host which collects running/status/state
statistics from things the the wrapper sends to syslog. I would hope that
others are cognizant of how flexible cron can be and how that is more feature
than curse for the most part.

