Hacker News new | past | comments | ask | show | jobs | submit login
Get notified when a periodic task doesn't run (deadmanssnitch.com)
52 points by alexknowshtml on Sept 24, 2012 | hide | past | web | favorite | 51 comments

To those complaining about the price - the level of technicality, like for MAILTO=user@domain.com in a crontab may not explain the price alone. Price != costs.

The right price is the price your target consumers are willing to pay. And if like another commenter you are doing to build it yourself instead of forking $19, you obviously are not the consumer. (and neither am I - I'll stick to cron :-) !)

With a crontab email, you get notified every time it runs. Their value proposition is to notify you when it doesn't run.

Part of this value is that their service keeps running. That is, you could set-up their service yourself (a process which expects to be pinged periodically, and emails/SMSs you if it isn't), but what if that service itself fails? (who shall watch the watcher)

It's not perfect: your periodic task might ping them, but not work fully (e.g. backup script runs, but didn't actually backup). You need some kind of verification (test-code) for your tasks that must succeed before you ping them. This is something they could help customers with - using blogs, articles, case-studies, and especially example code for common tasks (e.g. backup) etc. This would really help people (and incidentally publicize their service). Also, I love this copy: "Once you use it, you realize you've been doing it wrong for years."

I run a business(appmonsta.com) that relies heavily on a whole bunch of various periodic processes running when they should. I'd easily pay $19 to worry less at night about whether everything ran like it should have. I'm not sure whether Dead Man's Snitch would alleviate my worry, but the pain strong enough to justify the price point for me.

What I will say is that I probably would want to be able to set up 3-5 snitches before I'd feel comfortable committing to setting a bunch more up, which incidentally, is also the point I'd be willing to start paying money.

Thank's for pointing out that price != cost. The value of the service is in not having to ever panic again when checking the backups during an emergency.

It is very easy to be notified when something happens (for example, MAILTO in cron) but it's hard to know when something DOESN'T happen, especially if you work on a lot of sites.

There are plenty of ways to be notified when something doesn't happen if you are willing to use a little elbow grease... we use one of them to make sure DMS itself is working properly.

As the creator and owner of deadmansswitch.net, I hate you :P

Haha, sorry. Maybe you shouldn't squat on domains? :P

I didn't realize running a web app on it for seven years was squatting!

you might be thinking of deadmansswitch.com

I've used one-line cron scripts that capture output of a command, check it, and if it doesn't look like what it should, sends me an email.

I know a lot of designers that will love and support this product.

I really shouldn't have to create an account just to figure out what your pricing looks like.

^^^ this. Plus you say "Sign up for free" in the home page. At the very least it's misleading.

I'm not sure how it's misleading. You can sign up and have one snitch for free. It would be misleading if I said sign up for free and then didn't allow you to create ANY snitches without paying.

Yeah, say that exact thing _before_ people sign up. Just cut and paste "You can sign up and have one snitch for free." into the main page, so it's crystal clear to everyone, and hearsay from HN does not prevent people from creating the free account :-)

I think the problem is that without signing-up, there's no indication that it's a paid service, let alone what the price actually is.

A "Pricing" page in your navigation would be ideal, IMO. Even an entry in the FAQ ("How much does this cost?") would suffice, especially if it's the top item.

Monit does this pretty easily http://mmonit.com/monit/

You can monitor web services, processes, file modified dates, directories, loads of stuff all with email alerts and a web-front end too.

All with a thing called Round Robin Database, a database format built for temporal data and used by every graphing solution known to the IT world.

$19/month for more than one task? That's fine for companies, but I'm not going to pay that much as an individual, who needs 3, maybe 5 tasks to track.

I agree with this. Awesome service that you probably could scale up with properly with larger corporations while keeping the smaller individual projects/startups that use 3-10 tasks at lower costs or even free.

I know I could use this for 3 things on my own, but would probably rather just build it myself than pay $19/month. (although that is definitely more than reasonable if I wasn't using this for just independent projects)

Agreed. I don't think my company would use this at the moment until it's proven service that has a good track record and a SLA (what's your up-time guarantee?). That said, I'd use it for my personal projects but not for $19/month. I'd pay about $1/month per monitor.

I get what you're saying, but I think you're being overly skeptical. If you've already got a system for monitoring tasks in place, there's no reason you couldn't run both. And if you don't have a system in place, then if it fails you'll be no worse off than having never signed up.

I think most SLAs for these types of services are mostly pointless anyway. If something important to your business goes down, are you really going to be worried about whether or not you have to pay the $19?

RE: SLA... the use case is more "I want to know sooner than later that something didn't happen" than "Something didn't happen this minute, I want to know NOW!". If and when we go down, your task won't be able to check in and we'll end up sending you a false "positive" which is better than not knowing at all when something fails.

Actually, false positives with alarms are really, really bad - I will start to ignore them very quickly if they're not reliable indicators of an actual problem.

EDIT: If you have reasonable default tolerances or the ability to set tolerances on tasks, I'm pretty interested in trying it out - do you integrate or have plans to integrate with pagerduty, or do you simply fire off an email?

What I'm saying is, in the rare chance we are down and your service can't check in, that one time we would send a false positive. We wouldn't be flapping between off and on sending you a lot of false positives. The resolution on our checks is so large (the smallest is an hour) that we won't be flooding you with emails in any case.

We don't replace something like monit to make sure your process continues to run, we are validation that one-off periodic things run... things that are easy to forget about but are important.

Yeah, I'm not terribly worried by that kind of failure, if that happens, a bunch of other things have already gone wrong. I would like something to keep track of all processes, though, that can back up monit. Good to know it's got some slack in the tolerances, I'll give it a try.

^ yikes.

The pricing is more based on value. It is the price my clients said they would be more than willing to pay to never have to panic again when there is some EC2 problem and the backups stopped working weeks ago.

This actually happened over the weekend:


I understand that. It's just not that valuable for my personal projects.

I'd go further and say, that most of people willing to pay that much (serious about their backups and availability) already have some solution in place.

The service is very appealing for people, who don't really care that much (because it's very easy to use). I'm afraid that those people won't pay $19/mo.

If I charged my clients $19.99 a month for something like this they'd laugh at me and hire someone else.

Then you have the clients no one else wants (maybe that's why you have that username?). You can keep them.

Smart people go after customers willing to pay more.

Actually, I have some old cheap clients I fired some time ago... Should I send them your way? They need someone to yell at and then laugh at when the cost of their needs are quoted.

You are totally right, my clients do want to pay more for this kind of thing. BTW, your penis is huge.

Here's a shot at giving constructive feedback rather than bitching about the cost which most people here are doing. Point taken, there's the other stuff I can think of:

1) Disclose cost earlier. Telling people to sign up for free and then asking them to pay you is not cool. It starts off the relationship on a bad note and will prevent people from signing up.

2) You have a great idea here. Startups need this. We do not have time to set up and configure nagios, or some other warning mechanism. You make a good start by attacking the problem no one else actually wants to take care of, and making it really easy to do so.

3) Either forget the hobbyist or make another (less expensive) level of service for him. Do not forget there is very little money to be made on hobbyists unless it is very easy for you to take them on. You do not want to take on these people asking for the same service for $1 unless you can make it much cheaper to provide service to them.

4) As some have noted: Put in a SLA, and tell us why your service will not fail. Otherwise if we are not warned we will not know if it's because your service is down.

5) Put up a tutorial or more info on what you offer. Can we have planned maintenances (ie, do not warn me about this for 4 hours)? Can we hit the snooze button? What are the methods you use for notifications? How many people/emails/phones will be notified per failure? The unanswered questions go on and on.

6) Do more stuff. What about other type of monitoring? Can we group servers into groups so we do not have to set up each individually? Can we monitor stuff that IS running but I send you a value periodically through the HTTP hit? I want to graph stuff, like HTTP hits per hour, or HTTPD errors per minute. I want a warning to be sent to me when I get more than X HTTPD errors per minute, for example.

EDIT: 7) Add a trial period, this just makes sense.

So for work I run Nagios. I would love to have a Nagios set up for my side projects because Nagios provides a world of benefits but if I invested my time in setting up Nagios then I would not have time to do actual development.

You are onto something good, you just need to shape it a bit.

Thanks! I'll think on some of this stuff and probably incorporate some of it.

Still, first follow your hunches. :) Many of these ideas "sound good", but somehow it feels to me then you'd just replicate the functionality of some other service (probably something that the original poster was already using).

Like this site and it's simplicity a lot already, I think there's a lot more interesting things to explore before you'd get on with the feature creep. Good job!

The problem is that this doesn't address what I have found to be the most dangerous problem with periodic tasks.

That is when your cron job runs but there is some error in part of the script (for example maybe it writes/reads a file in a folder but the permissions on that folder were changed since the script was written). This causes an error which might cause a cascade of errors meaning that some other parts of your job either fail to run or run incorrectly.

Now what happens here, do you get notified of the error or does it just get silently eaten? It's also very possible that your system will eat the error and then proceed to the next step (calling this API) and everything will appear fine.

One thing I figure out what the expected output from the job should be, I then pipe the output from the cronjob into a file. I have a second cronjob that checks the contents of this file periodically and generates an alert if it does not match what is expected.

You should also try and find some way to test any generated data. For example if you are doing a DB backup, add another table with a field that contains data that is in some way based on the date. You can then have a task which will try and restore old backups into another DB, it can then check this field against the expected value for the date of the backup.

Of course none of these techniques are silver bullets and there are plenty of things that can go wrong, it is certainly prudent to check things manually every once in a while.

Perhaps this API could be modified to take as an input the output from scheduled tasks and check them?

You have to make sure your periodic tasks acts in such a way that if a piece fails, the rest doesn't execute (using && for instance). Then you would add hitting your special url the last thing to execute.

If you have a way of checking things like this, great! You should keep doing that!

That is true, but makes the assumption that all scripts along the way will return the correct values on errors etc. Not always true for rushed in-house scripts.

You may also have to consider warning conditions which might be more catastrophic to your "pipe" that the original program would believe.

I run this as a combination of monit timestamp checking and stdout to a file. Is there anything this does that I cannot do this without monit?



1 23 * * * /home/cron/backup.sh 1> /backups/backup

and monit doing a

check file backup with path /backups/backup

if timestamp > 24 hours then alert

This way I am emailed when there is an error with the backup script, otherwise things continue as it is.

This will be AWESOME for monitoring daily backups on the large number of sites/databases that should be getting backed up automatically every night. They work reliably for a while and then I forget about checking them. Would be nice to know when one of those starts to choke.

Those using cron hacks to check on previous cron jobs might want to take a look at Matt Dillon's cron (as opposed to Paul Vixie's cron), which is the crond that Slackware uses.

It has named cron jobs, @noauto and AFTER keywords, which let you run jobs depending on whether previous jobs (identified by name) have run successfully.

That, combined with using things like && and || and mailx, give you quite powerful ways of checking on previous cron jobs.

Ouch. I've been dabbling at a project that does the exact same thing and your implementation is so perfect that I may give up and sign up :-)

I'm sure this will make tons of money and will be amazingly useful as a hip tool for those developers we all know. But honestly the rest of us build this kind of stuff into our company dashboards without thinking twice. Where it belongs, in the hands of an internal team who can share the monitoring of more than $20 worth of "snitches".

You fail to take into account the small team, whose only objective - or at least that which consumes 99.9% of the brain share - is getting a product out the door and keeping the user-facing part of it running.

Relevant username.

If supporting, contributing and deploying to open source software makes me cheap than so be it. I'm ecstatic at the chance to share that technology with my clients and to help them give it back where applicable.

I hope your endeavors continue to make you happy with your means as they do mine.

I definitely know quite a few small projects that this could be extremely useful for. However, I think some people are right that Munin works better at tracking these things long term.

And what service do I use to find out when this service doesn't run?

What are you currently using to make sure your periodic tasks run?

As far as making sure DMS runs, we use another third-party to make sure our checkers run. Contact support if you are interested in finding out more.

We have an elaborate system that someone cooked up a while ago that involves a tree of communicating Processes running on different machines that monitor our real time services. If we could consolidate our 'checker' processes into one that may be the right thing to do. Can the system be set up to use authentication to get behind firewalls or do hue things being monitored have to be accessible over the public Internet?

The interface is a simple HTTPS request. The authentication is done using a unique URL component. This is an example of how to use Dead Man's Snitch:

$ curl https://nosnch.in/c2354d53d2

Dead Man's Snitch does not ping your servers. It is the opposite way around. If your server stops pinging Dead Man's Snitch, you will be notified.

I'm not sure I understand, however, it works by your process reaching out to US to check in. I'd suppose you would maybe have to open up outbound traffic to our domain? I'd love to hear more. Feel free to email hi@deadmanssnitch.com.

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact