
Get notified when a periodic task doesn't run - alexknowshtml
https://deadmanssnitch.com/
======
guylhem
To those complaining about the price - the level of technicality, like for
MAILTO=user@domain.com in a crontab may not explain the price alone. Price !=
costs.

The right price is the price your target consumers are willing to pay. And if
like another commenter you are doing to build it yourself instead of forking
$19, you obviously are not the consumer. (and neither am I - I'll stick to
cron :-) !)

~~~
r38y
Thank's for pointing out that price != cost. The value of the service is in
not having to ever panic again when checking the backups during an emergency.

It is very easy to be notified when something happens (for example, MAILTO in
cron) but it's hard to know when something DOESN'T happen, especially if you
work on a lot of sites.

There are plenty of ways to be notified when something doesn't happen if you
are willing to use a little elbow grease... we use one of them to make sure
DMS itself is working properly.

~~~
StavrosK
As the creator and owner of deadmansswitch.net, I hate you :P

~~~
r38y
Haha, sorry. Maybe you shouldn't squat on domains? :P

~~~
StavrosK
I didn't realize running a web app on it for seven years was squatting!

------
omni
I really shouldn't have to create an account just to figure out what your
pricing looks like.

~~~
gm
^^^ this. Plus you say "Sign up for free" in the home page. At the very least
it's misleading.

~~~
r38y
I'm not sure how it's misleading. You can sign up and have one snitch for
free. It would be misleading if I said sign up for free and then didn't allow
you to create ANY snitches without paying.

~~~
gm
Yeah, say that exact thing _before_ people sign up. Just cut and paste "You
can sign up and have one snitch for free." into the main page, so it's crystal
clear to everyone, and hearsay from HN does not prevent people from creating
the free account :-)

------
Wintamute
Monit does this pretty easily <http://mmonit.com/monit/>

You can monitor web services, processes, file modified dates, directories,
loads of stuff all with email alerts and a web-front end too.

~~~
cheap
All with a thing called Round Robin Database, a database format built for
temporal data and used by every graphing solution known to the IT world.

------
barnaba
$19/month for more than one task? That's fine for companies, but I'm not going
to pay that much as an individual, who needs 3, maybe 5 tasks to track.

~~~
wrath
Agreed. I don't think my company would use this at the moment until it's
proven service that has a good track record and a SLA (what's your up-time
guarantee?). That said, I'd use it for my personal projects but not for
$19/month. I'd pay about $1/month per monitor.

~~~
r38y
RE: SLA... the use case is more "I want to know sooner than later that
something didn't happen" than "Something didn't happen this minute, I want to
know NOW!". If and when we go down, your task won't be able to check in and
we'll end up sending you a false "positive" which is better than not knowing
at all when something fails.

~~~
ericd
Actually, false positives with alarms are really, really bad - I will start to
ignore them very quickly if they're not reliable indicators of an actual
problem.

EDIT: If you have reasonable default tolerances or the ability to set
tolerances on tasks, I'm pretty interested in trying it out - do you integrate
or have plans to integrate with pagerduty, or do you simply fire off an email?

~~~
r38y
What I'm saying is, in the rare chance we are down and your service can't
check in, that one time we would send a false positive. We wouldn't be
flapping between off and on sending you a lot of false positives. The
resolution on our checks is so large (the smallest is an hour) that we won't
be flooding you with emails in any case.

We don't replace something like monit to make sure your process continues to
run, we are validation that one-off periodic things run... things that are
easy to forget about but are important.

~~~
ericd
Yeah, I'm not terribly worried by that kind of failure, if that happens, a
bunch of other things have already gone wrong. I would like something to keep
track of all processes, though, that can back up monit. Good to know it's got
some slack in the tolerances, I'll give it a try.

------
gm
Here's a shot at giving constructive feedback rather than bitching about the
cost which most people here are doing. Point taken, there's the other stuff I
can think of:

1) Disclose cost earlier. Telling people to sign up for free and then asking
them to pay you is not cool. It starts off the relationship on a bad note and
will prevent people from signing up.

2) You have a great idea here. Startups need this. We do not have time to set
up and configure nagios, or some other warning mechanism. You make a good
start by attacking the problem no one else actually wants to take care of, and
making it really easy to do so.

3) Either forget the hobbyist or make another (less expensive) level of
service for him. Do not forget there is very little money to be made on
hobbyists unless it is very easy for you to take them on. You do not want to
take on these people asking for the same service for $1 unless you can make it
much cheaper to provide service to them.

4) As some have noted: Put in a SLA, and tell us why your service will not
fail. Otherwise if we are not warned we will not know if it's because your
service is down.

5) Put up a tutorial or more info on what you offer. Can we have planned
maintenances (ie, do not warn me about this for 4 hours)? Can we hit the
snooze button? What are the methods you use for notifications? How many
people/emails/phones will be notified per failure? The unanswered questions go
on and on.

6) Do more stuff. What about other type of monitoring? Can we group servers
into groups so we do not have to set up each individually? Can we monitor
stuff that IS running but I send you a value periodically through the HTTP
hit? I want to graph stuff, like HTTP hits per hour, or HTTPD errors per
minute. I want a warning to be sent to me when I get more than X HTTPD errors
per minute, for example.

EDIT: 7) Add a trial period, this just makes sense.

So for work I run Nagios. I would love to have a Nagios set up for my side
projects because Nagios provides a world of benefits but if I invested my time
in setting up Nagios then I would not have time to do actual development.

You are onto something good, you just need to shape it a bit.

~~~
r38y
Thanks! I'll think on some of this stuff and probably incorporate some of it.

~~~
imrehg
Still, first follow your hunches. :) Many of these ideas "sound good", but
somehow it feels to me then you'd just replicate the functionality of some
other service (probably something that the original poster was already using).

Like this site and it's simplicity a lot already, I think there's a lot more
interesting things to explore before you'd get on with the feature creep. Good
job!

------
jiggy2011
The problem is that this doesn't address what I have found to be the most
dangerous problem with periodic tasks.

That is when your cron job _runs_ but there is some error in part of the
script (for example maybe it writes/reads a file in a folder but the
permissions on that folder were changed since the script was written). This
causes an error which might cause a cascade of errors meaning that some other
parts of your job either fail to run or run incorrectly.

Now what happens here, do you get notified of the error or does it just get
silently eaten? It's also very possible that your system will eat the error
and then proceed to the next step (calling this API) and everything will
appear fine.

One thing I figure out what the expected output from the job should be, I then
pipe the output from the cronjob into a file. I have a second cronjob that
checks the contents of this file periodically and generates an alert if it
does not match what is expected.

You should also try and find some way to test any generated data. For example
if you are doing a DB backup, add another table with a field that contains
data that is in some way based on the date. You can then have a task which
will try and restore old backups into another DB, it can then check this field
against the expected value for the date of the backup.

Of course none of these techniques are silver bullets and there are plenty of
things that can go wrong, it is certainly prudent to check things manually
every once in a while.

Perhaps this API could be modified to take as an input the output from
scheduled tasks and check them?

~~~
r38y
You have to make sure your periodic tasks acts in such a way that if a piece
fails, the rest doesn't execute (using && for instance). Then you would add
hitting your special url the last thing to execute.

If you have a way of checking things like this, great! You should keep doing
that!

~~~
jiggy2011
That is true, but makes the assumption that all scripts along the way will
return the correct values on errors etc. Not always true for rushed in-house
scripts.

You may also have to consider warning conditions which might be more
catastrophic to your "pipe" that the original program would believe.

------
Cherian
I run this as a combination of monit timestamp checking and stdout to a file.
Is there anything this does that I cannot do this without monit?

e.g.

MAILTO=email@ops.com

1 23 * * * /home/cron/backup.sh 1> /backups/backup

and monit doing a

check file backup with path /backups/backup

if timestamp > 24 hours then alert

This way I am emailed when there is an error with the backup script, otherwise
things continue as it is.

------
svmegatron
This will be AWESOME for monitoring daily backups on the large number of
sites/databases that _should_ be getting backed up automatically every night.
They work reliably for a while and then I forget about checking them. Would be
nice to know when one of those starts to choke.

------
Nick_C
Those using cron hacks to check on previous cron jobs might want to take a
look at Matt Dillon's cron (as opposed to Paul Vixie's cron), which is the
crond that Slackware uses.

It has named cron jobs, @noauto and AFTER keywords, which let you run jobs
depending on whether previous jobs (identified by name) have run successfully.

That, combined with using things like && and || and mailx, give you quite
powerful ways of checking on previous cron jobs.

------
jauer
Ouch. I've been dabbling at a project that does the exact same thing and your
implementation is so perfect that I may give up and sign up :-)

------
cheap
I'm sure this will make tons of money and will be amazingly useful as a hip
tool for those developers we all know. But honestly the rest of us build this
kind of stuff into our company dashboards without thinking twice. Where it
belongs, in the hands of an internal team who can share the monitoring of more
than $20 worth of "snitches".

~~~
alexknowshtml
Relevant username.

~~~
cheap
If supporting, contributing and deploying to open source software makes me
cheap than so be it. I'm ecstatic at the chance to share that technology with
my clients and to help them give it back where applicable.

I hope your endeavors continue to make you happy with your means as they do
mine.

------
justinwr
I definitely know quite a few small projects that this could be extremely
useful for. However, I think some people are right that Munin works better at
tracking these things long term.

------
metaobject
And what service do I use to find out when this service doesn't run?

~~~
r38y
What are you currently using to make sure your periodic tasks run?

As far as making sure DMS runs, we use another third-party to make sure our
checkers run. Contact support if you are interested in finding out more.

~~~
metaobject
We have an elaborate system that someone cooked up a while ago that involves a
tree of communicating Processes running on different machines that monitor our
real time services. If we could consolidate our 'checker' processes into one
that may be the right thing to do. Can the system be set up to use
authentication to get behind firewalls or do hue things being monitored have
to be accessible over the public Internet?

~~~
cmwelsh
The interface is a simple HTTPS request. The authentication is done using a
unique URL component. This is an example of how to use Dead Man's Snitch:

$ curl <https://nosnch.in/c2354d53d2>

Dead Man's Snitch does not ping your servers. It is the opposite way around.
If your server stops pinging Dead Man's Snitch, you will be notified.

