
Show HN: Highlander – Stop Overlapping Python Cron Jobs - ccannon
https://github.com/chriscannon/highlander
======
fideloper
Does anyone use flock? I came across it recently and believe it serves the
same purpose, very useful from cron tasks:

man page: [http://linux.die.net/man/1/flock](http://linux.die.net/man/1/flock)

Example: [https://ma.ttias.be/prevent-cronjobs-from-overlapping-in-
lin...](https://ma.ttias.be/prevent-cronjobs-from-overlapping-in-linux/)

~~~
gerad
Yeah we use flock for this all the time. So much so that I'm surprised this is
news.

~~~
ccannon
This is a pure python solution.

~~~
nnutter
Does that add value to stop overlapping from jobs?

------
falcolas
As someone who has had to write this themselves multiple times, there are a
few bits that I consider to be missing:

1) Command line verification - is the pid owned by the same type of process as
is running now? PIDs are re-used, ensure it's the same (the creation time
check helps, but it doesn't say anything about what process wrote it).

2) Process Hang Detection - Has the process actually consumed any CPU ticks in
the last minute?

3) Infinite loop detection - Is the other process stuck processing something
uselessly?

4) Killing off stuck processes - 2 or 3 true? Behead it and continue on.
Optionally do some form of alerting - stderr is probably fine.

Add these, and I would personally find it much more useful.

~~~
ccannon
To address your concerns:

1\. I assume that it's the same type of process because by default the PID
file is being written to the current working directory of the script. If you'd
like, you can specify a location yourself to ensure that each type of process
is grouped on one PID file.

2\. Out of curiosity, how would you go about doing this?

3\. I think this would be _really_ difficult to accomplish.

4\. I agree if we could somehow figure out 2 or 3, that would be great.

~~~
michaelmior
The point for #1 is that PIDs are reused. Just because the cron job previously
started a process with a PID 473 doesn't mean that PID 473 is that same
process the next time the cron job comes around to check. It's entirely
possible that the original process was finished or killed and a new process
started with the same PID.

~~~
ccannon
This is why I also store the creation time to avoid reused PIDs.

------
geertj
On systemd systems there's an easier alternative to this. You can use the per-
user systemd instance (systemctl --user) to install a .timer that activates a
.service file. If the .service is still running when the .timer next fires, it
will not be started again. Systemd is pretty good at this kind of bookkeeping.

~~~
kiallmacinnes
This really isn't meant as piling on systemd - please don't read it that way!

But, has systemd now replaced cron too?

~~~
digi_owl
Yep:
[http://www.freedesktop.org/software/systemd/man/systemd.time...](http://www.freedesktop.org/software/systemd/man/systemd.timer.html)

------
wc-
I've had a lot of sucess using a key in redis with a TTL value instead of a
local PID file. Although adding redis to the picture adds a large new point of
failure, I can then have a cronjob set up on multiple instances and still
ensure it only runs once across all of them.

I'm sure there is a simpler way of doing this, how have other people solved
redundantly ensuring a single cronjob runs?

~~~
kiallmacinnes
The common and "traditional" way of doing distributed locking is with a
coordination service like ZooKeeper. ZooKeeper style services have an
advantage of no TTLs - the moment a process dies, the lock is released, and
the next in line waiting on the lock is immediately notified.

Redis/Memcache with a TTL serves this purpose for the most part, but if you
require as close to a 100% guarantee that 1 and only 1 process holds the lock
at any given time, these will eventually fail you. Think network partitions,
tasks outlasting the TTL, replication lag/eventual consistency etc.

ZooKeeper and similar use concensus protocols like ZAB, Paxos or Raft to
provide guarantees even in the face of failure.

~~~
ccannon
This is not meant for distributed systems.

~~~
kiallmacinnes
The parent comment was certainly talking about distributed systems!

------
ccannon
I always encounter the problem where I write Python scripts that run on a cron
job that sometimes take longer than the interval before the same cron job will
run again (e.g., I have a cron that runs every hour and one run takes 2 hours
to complete). In this scenario, you would want the first cron to complete
before the second cron is run. What Highlander does is if it sees that your
cron is already running, it immediately returns thereby skipping that cron
run.

~~~
sidmitra
Does this work with celery tasks?

My usual solution is to add checks in the cron job to make sure they don't
repeat or duplicate anything, by using an audit table. So for example when a
celery tasks triggers an email, I store an event called EMAIL_X_SENT to the
audit table with meta data and check it later before sending it again.

Of course it complicates the logic a bit but I've noticed it's the same
pattern that works everywhere so I just abstracted most of into a custom task
class.

Another way typically is to use a shared lock just like above except in the
cache backend. So you could probably extend highlander to use cache backend
etc.

~~~
ccannon
I think you could use this with celery tasks so as long as each worker used a
different PID file.

------
wumbernang
Windows has by far the best solution to this since Windows Vista/2008 server.
Full instance control provided by the OS, fully scriptable with powershell,
desired state configuration (like ansible), clustering, logging, fully event
driven i.e. can trigger on network/OS events with GUI, WMI, script and COM
integration.

Genuinely wish someone knocked out something like this. systemd is part of the
way there but not quite far enough.

~~~
ccannon
This is a multi-platform solution.

------
snide
Always bring this up anytime I see "Highlander" being used for a project name.

[http://blogs.msdn.com/b/oldnewthing/archive/2014/09/23/10559...](http://blogs.msdn.com/b/oldnewthing/archive/2014/09/23/10559783.aspx)

~~~
volker48
Thats a great link, but I think the project is named highlander not because
its the only such project with this function, but because its function is for
there to be only one instance of process running.

------
88e282102ae2e5b
Why not just use the fnctl module from the standard library?

------
mrfusion
Is there a library to just parse cron strings and know when to fire?

------
userbinator
In other words, it effectively makes the process a singleton?

~~~
ccannon
If you want to look at it from an OO perspective I guess you could say that.
It simply just only lets one python script run at a time.

