
Using AWS Lambda to call and text you when your servers are down - nhm
https://thisdata.com/blog/using-aws-lambda-to-call-and-text-you-when-your-servers-are-down/
======
Dobbs
From a engineering point of view this is really cool, but as an ex-sysadmin I
feel that I need to reiterate and emphasize something that is alluded to in
the second paragraph.

Too many things can go wrong and you are all around better off outsourcing
this to something like Pingdom. You don't have sufficient levels of
reliability, you aren't dual homed across twilio and another phone system.
Maybe the cause of your outage is that AWS is having issues. Now your site and
your monitoring is down.

Much better to outsource to people who obsess over doing this right and making
sure they are properly redundant.

~~~
melvinmt
> Now your site and your monitoring is down. Much better to outsource to
> people who obsess over doing this right and making sure they are properly
> redundant.

You make valid points about redundancy and levels of reliability but keep in
mind that even Pingdom can go down: [http://royal.pingdom.com/2016/10/24/ddos-
attack-affects-ping...](http://royal.pingdom.com/2016/10/24/ddos-attack-
affects-pingdom)

~~~
user5994461
Chances are that pingdom won't be down at the same time that your site is
down.

Diversify to avoid cascading failures ;)

------
falcolas
And I'm sure Lambda will never go down. Right? Right??

(It has. Completely and silently stopped processing against Kinesis queues for
a few hours recently. Guess what AWS Step is built on?)

~~~
cddotdotslash
Of course it can go down, and you can have CloudWatch alerts to alert you
about that. But so can your Nagios server sending pings go down or the fancy
SaaS you signed up for.

~~~
robinson-wall
Did you just suggest using a third AWS service to let you know if the second
AWS service monitoring your first AWS service goes down?

~~~
cddotdotslash
Yes, because they're different services running on different architecture and
distributed differently. I challenge you to find one time in the past five
years where CloudWatch was down at the same time as other services. Even if
you can, I'm sure your custom built Nagios server in your datacenter has gone
down as many or more times too.

But my bigger point here is that you're essentially asking "well how do you
monitor your monitor?" At which point up the chain do you have enough? Also, I
think the original post was simply a demo of what is possible. Yet whenever
someone posts something, people go in the comments to belittle it. "Yeah, you
built a monitoring solution... Well what happens if _that_ goes down?"

Which is a legitimate question. But obviously if your production service is
that critical to your business, you won't be monitoring it with a service that
costs $0.0000002 per execution.

~~~
falcolas
> they're different services running on different architecture and distributed
> differently

I think you underestimate the interdependency of services in AWS.
Historically, if there were problems with S3 or EBS in us-east-1, you could
expect the entire API to be flaky, and things like autoscaling to fail. These
have been better distributed, but failures still cascade.

> I think the original post was simply a demo of what is possible

No, it wasn't a demo, it was an actual production issue. No alarms, no error
logs, no way to tell it wasn't working other than someone noticing the queues
were getting larger and contacting AWS.

> people go in the comments to belittle it

Only because the original project projects AWS Lambda as "the solution" for
such problems, not realizing that it is just as fallible a solution as
everything else.

> Well what happens if that goes down?

The solution to this is well known - two monitoring systems in physically
separate locations that monitor each other as well as mission critical
systems. Nagios, Icinga, and a dozen other well-tested solutions work
remarkably well for these roles, yet people keep writing "new" solutions over
and over and over.

> But obviously if your production service is that critical to your business,
> you won't be monitoring it with [this] service

Then what's it's value, other than as an intellectual exercise?

~~~
cddotdotslash
Launch two Lambda functions, heck, 8 Lambda functions, one in each AWS region
that supports it. They all monitor one another, plus run your checks. Next,
are you going to say all 8 regions will go down at once?

The whole setup will still cost $0/month.

> The solution to this is well known - two monitoring systems in physically
> separate locations that monitor each other as well as mission critical
> systems. Nagios, Icinga, and a dozen other well-tested solutions work
> remarkably well for these roles, yet people keep writing "new" solutions
> over and over and over.

Because not everyone needs heavy solutions to do something simple. Side
projects, small sites, etc. And some people enjoy implementing old use cases
using new technology. When Go was rising in popularity, half the posts on the
front page were re-implementing fairly common features in Go.

Even if you're not going to implement this yourself, there can still be some
value for other readers.

~~~
falcolas
> are you going to say all 8 regions will go down at once

I hope not. But then it's not just Lambda triggered by cloudwatch alarms
anymore. You'd probably have to set up something to ensure that Lambda, when
called via cloudwatch alarms, is being triggered properly. Useful, but
suddenly a lot more complicated.

> The whole setup will still cost $0/month.

Unlikely. A small amount, but certainly not 0. Especially when you start
adding Lambda heartbeats.

> And some people enjoy implementing old use cases using new technology.

Which is fine; call it an experiment, call it exploration, I have no problem
with that. It's frustrating to see such a stripped down article treating it
like it's going to be _the one_ , without reasonable discussions about how it
could fail. There are a minimum of three failure points in this system alone,
with no discussion on how to compensate for them.

------
tjholowaychuk
I wrote Apex Ping ([https://apex.sh/ping/](https://apex.sh/ping/)) for those
who want more features and/or don't want to waste the time to save a few bucks
:D.

~~~
gingerlime
Apex ping is great, but I'm still waiting for SMS / Twilio integration (hint,
hint, nudge nudge) :)

~~~
tjholowaychuk
:D I wish SMS wasn't so awkward, you're pretty much forced to have a credit
system since it's so expensive. I'll probably still do it at some point. Makes
it awkward for the customer as well if you have to babysit the credits

~~~
gingerlime
I suggested it before, but I think you can work around it: Let your customers
give you their Twilio API key (with a big disclaimer that any charges by
Twilio are not your responsibility...).

~~~
tjholowaychuk
If I can get costs down elsewhere I'll maybe just do "unlimited" on some of
the accounts, where "unlimited" is some large arbitrary number haha.

~~~
cuu508
That would be the nicest user experience for your users, but it is a bit
risky. You probably have a "reasonable number of notifications per user per
month" in mind. As you sign up new users, you will sooner or later get some
that will exceed that number by a lot--without a malicious intent.

------
tymm
I wrote something similar in bash and put it into a docker image:
[https://hub.docker.com/r/simplepush/alerta](https://hub.docker.com/r/simplepush/alerta)

Just running this docker image on a server you want to monitor is enough.

Instead of Twilio it uses Simplepush
([https://simplepush.io](https://simplepush.io)).

~~~
cyberferret
Simplepush looks like a cool service - thanks for the heads up. It seems that
it accomplishes the author's main need - that for a constant buzzing which
needs to be picked up and dealt with.

EDIT: Just seen that it is Android only! :-/

~~~
ubercow
If you need something similar that works on iOS (and Android), take a look at
[https://pushover.net/](https://pushover.net/)

I use it for some personal automation scripts that might need to get my
attention if something goes wrong.

------
dmourati
Isn't the better plan to use Lambda _instead_ of your servers?

------
justinc8687
I use [https://aremysitesup.com/](https://aremysitesup.com/) and I've found it
really helpful as it one of the few inexpensive services I've found that will
CALL me if things are down. SMS is nice, but I use the do-not-disturb feature
on my phone in the evenings, and at least on iOS, the only way to punch
through that is with a call from a number on my favorites list. This meets
that need very well and I've found the service to be quite spot on alerting me
(both when I had one instance of things hitting the fan, but also during
scheduled maintenance). I'd highly recommend.

------
intrasight
The advantage of this type of cloud solution over a one-size-fits-all cloud
service like Pingdom (which I use) is flexibility. You can configure cloud
agents to perform nearly any task you can envision.

------
gravypod
I don't know if everyone knows this but you can make texts using email.

Most providers have SMTP gateways for SMS services. Verizon runs @vtext.com

~~~
gtaylor
Just keep in mind that these aren't incredibly reliable across the board.
Others have very low or arbitrary autoban or blacklist policies. I eventually
caved and paid Twilio to hassle with SMS logistics for me, rather than deal
with the weirdness.

------
social_quotient
Note: instead of dynamodb for the lookup mentioned at the bottom, maybe
consider [https://aws.amazon.com/athena/](https://aws.amazon.com/athena/) for
an s3 query

~~~
illumin8
You could literally just store a .CSV file in S3 with a table that has the on-
call schedule in it and run SQL queries against Athena that would be cheap...
you'd be querying a few KB, but DynamoDB is probably better for this use case,
honestly. Athena is great for scanning huge datasets very quickly.

------
cagataygurturk
Route53 Health Checks & SNS can send sms message without any Lambda involved.

------
theparanoid
I've used montastic.com for years. 2min setup to fire and forget.

