
Ask HN: How do you monitor your websites? - skies
What tools do you use to monitor uptime of your web apps and&#x2F;or APIs? Also, how do you track SSL&#x2F;domain name expiry?
======
jongleberry
At Dollar Shave Club, we use CircleCI 2's Scheduled Workflows to run
"monitors" against our production services every minute. These are idempotent,
analytics-disabled API & Browser (via Puppeteer) tests that we also run as CI
tests on every commit.

We send all monitor metrics to DataDog. When a monitor fails, the appropriate
teams will get a Slack notification with the full stack trace. A DataDog
monitor will also be triggered, alerting the appropriate teams.

For browser monitors, we upload screenshots and Puppeteer tracing files to S3,
then share links within each Slack hook. This allows people to figure out
what's going on just by clicking links in Slack.

We were planning to improve this setup in the future, but it's good enough for
us right now. For example, CircleCI goes degrades frequently so we sometimes
get spotty coverage. We basically spend < $200/month with CircleCI to monitor
about 300 APIs/pages every minute.

You can read more here:

\- [https://engineering.dollarshaveclub.com/monitor-all-the-
thin...](https://engineering.dollarshaveclub.com/monitor-all-the-
things-4b6f88a922e6)

\- [https://circleci.com/blog/how-dollar-shave-club-3x-d-
velocit...](https://circleci.com/blog/how-dollar-shave-club-3x-d-velocity-and-
learned-love-tests/)

\-
[https://github.com/dollarshaveclub/monitor](https://github.com/dollarshaveclub/monitor)

------
js4ever
UptimeRobot is quick to setup and have a lot of options, including a white
label portal [0] to monitor status it's reliable and cheap ($54/year)

I'm not affiliated with them, just an happy customer

[0] E.g. status page:
[https://status.appdrag.com/](https://status.appdrag.com/)

~~~
gregopet
I've had a bad experience with Uptime Robot - it only checked our server via
IPv4. We didn't know IPv6 was down for far too long (it was a DNS problem).

~~~
timbit42
Sounds like you didn't check that they checked for ipv6. Businesses aren't
going to tell you where their services are lacking.

------
wilsonnb3
Once every few months, I go to the website.

~~~
cascada
Really??? Teach me! How do you manage doing that???

~~~
danillonunes
Step 1: Go to the website.

Step 2: After a few months, repeat step 1.

------
philip1209
I basically add a /global-health endpoint on my server. It executes a bunch of
checks programmatically - e.g. database connection, rendering working, etc. It
would be easy to add in "fail if cert expires in next month".

Then, I monitor just that one endpoint with Stackdriver (because it's easy).
If any of the checks fail, it logs it, prints details, and sets a 500 header
code. Adding new checks is just a code change.

~~~
Spivak
Are you worried about that endpoint being used for DoS or is it light enough?

~~~
iw0
The way i do it is running the checks in a background every x minutes and add
the result in a json file. It’s fast and safe.

~~~
petecoop
I'd guess you'd then also want to return a timestamp in the response and then
fail if the timestamp is older than x minutes too

------
mindcrime
_What tools do you use to monitor uptime of your web apps and /or APIs?_

I use a custom AWS Lambda function. It fires every four hours or so, and tries
to make an https connection to each configured URL (the URLs are stored in a
file in S3) and if the site is either down, or if there is an SSL error (which
probably means an expired certificate) then it sends me a text message using
SNS.

The whole thing is about 50 lines of code, and that's in Java. And it doesn't
even come close to exceeding the free tier limit of Lambda calls, so it
doesn't even cost anything so far.

To be fair, I could have used a 3rd party service, but writing this thing was
my first foray into using Lambda, so I did it as much for the learning
experience as anything. But it works really well, so I doubt I'll replace it
anytime soon.

~~~
exceptionallyOK
Just wondering, did you use Scheduled Events or what did you use to implement
firing on an interval?

~~~
mindcrime
Yeah, it's just using the standard AWS scheduled events stuff. Nothing fancy.
Then again, our needs (for now) are pretty simple. I basically just want to
know if our website goes down or if our SSL certificate expires or something.

One of the sites is a SaaS offering, but it's not live yet, so I don't need to
stay super-on-top-of-it. Once it's live we'll want more frequent monitoring
and some other stuff, so we might either move to another approach, or
supplement this with something else.

------
keithwhor
If you want a light (and free) DIY solution, it's pretty straightforward to
build a basic uptime monitor (with connections to Slack, SMS, whatever) using
Standard Library [0] and Scheduled Tasks, one of our engineers just posted
this article (you can build from your browser):

[https://hackernoon.com/build-an-uptime-monitor-in-minutes-
wi...](https://hackernoon.com/build-an-uptime-monitor-in-minutes-with-slack-
standard-library-and-node-js-e7973d659849)

Another option if you don't feel in the mood for DIY is TJ Holowaychuk's Apex
Ping: [https://apex.sh/ping/](https://apex.sh/ping/). Great service, run by a
solo developer, reasonable price.

[0] [https://stdlib.com](https://stdlib.com)

Disclaimer: I founded Standard Library. :)

~~~
SahAssar
Sorry but that has got to be one of the worst company/product names ever.
Either it is completely un-searchable since literally every programming
language has a "standard library" or if it actually takes off it will push
down actually relevant results for niche programming language docs.

~~~
keithwhor
Yes — you’re not the first person to come at me with a pitchfork for the
domain and name and won’t be the last. I do appreciate the feedback, but we’re
committed to the name. :)

We haven’t historically had a problem with “stdlib”, we’re already the top
Google result. “Standard Library” (full name) is new for us as we expand to a
less technical cohort of customers. We’re working with some pretty great
people and companies (Stripe, Slack) on our mission to build a, well, Standard
Library — so if you can get over the name choice you should check out our
online development environment! ->
[https://code.stdlib.com/](https://code.stdlib.com/)

------
CodeAlong
We have been fairly happy with Runscope [0] for fairly simplistic monitoring
of api response codes and body payloads. My biggest complaint is probably the
lack of individualized response data after >= 24 hours.

Still waiting to see if the CA Technologies acquisition [1] makes things worse
or not.

[0] [https://www.runscope.com/](https://www.runscope.com/) [1]
[https://blog.runscope.com/posts/301](https://blog.runscope.com/posts/301)

~~~
johns
Note that the retention is not time-based:
[https://www.runscope.com/support/kb/test-result-
retention](https://www.runscope.com/support/kb/test-result-retention)
Hopefully we have a better solution for this in the future.

There are more resources working on Runscope than ever before. CA continues to
invest in stability, new features and support. CA is also going through an
acquisition and that introduces more variables, but as of today (1+ year after
acquiring us), CA has been extremely supportive of Runscope.

------
raresp
Try Protectumus: [https://protectumus.com](https://protectumus.com)

Protectumus monitors the website uptime, speed, dns changes, scans the website
for malware like a traditional antivirus, blocks bad bots, custom IP's and
countries. Protectumos acts as a Firewall. We are the only security company
specialized in SEO Security, we offer unique SEO services such as Search
engines Cloaking monitoring, Google DMCA Complaints, Blacklist monitoring &
removal and more.

Details here -
[https://news.ycombinator.com/item?id=18295381](https://news.ycombinator.com/item?id=18295381)

Full disclosure: I'm the founder.

~~~
jamieweb
From your site:

> Protectumus acts as a web application firewall (WAF) and scans the website
> for known malware. Once the malware is found it will be automatically
> removed.

How does a WAF remove malware? Is there an additional agent sitting on the
server or something? And surely when a server is pwned, it needs to be
reformatted, not just the malware 'removed'.

~~~
raresp
Thanks for your question. We have both a Firewall and a traditional Antivirus.
They work separately.

The firewall blocks bad bods, hack attempts like SQL injections, XSS, CSRF and
more.

The antivirus scans for known malware (we have a big list of malware
definitions), but we use AI and Machine Learning and the Antivirus learns from
previous detections and is able to act alone. The script is able to
automatically remove the malware once it is found.

~~~
jamieweb
Thank you for the info.

~~~
raresp
In case you're interested in my product I prepared a 100% discount coupon,
when you register use this coupon code: 193AE710F68353B8C17774D73BE52466

------
m_x_m
I use uptimerobot.com to monitor uptime for personal or client stuff that
isn't really mission-critical.

It's not perfect but I hardly have issues with it.

Honestly, I don't do expiry checks...

------
cpburns2009
To piggy back on the question, has anyone had a good experience using
Prometheus and Grafana for monitoring? I'm looking into trying it. I've looked
into Zenoss but from what I gather it's slow.

~~~
mkez00
I love it. Coupled with Alertmanager it's a really easy to use and powerful
platform. We also started serving custom Prometheus exporters for our product
usage which has been helpful and really easy to implement.

------
kels
I have no affiliation with any of these companies.

I use StatusCake for basic uptime monitoring for websites. I switched to them
from Pingdom because they are cheaper. Only downsides I've had with StatusCake
is if something is down it doesn't give you the cause. Pingdom would show you
the trace route. That has made it hard to tell and I would get quite a few
false positives saying sites were down. I haven't had false positive issues
for months now though. Their paid plans have SSL/domain monitoring.

Monitis for cheap Linux CPU/RAM/Load monitoring.

~~~
abrongersma
I've also had great success with StatusCake. The SSL cert monitoring is a
lifesaver.

------
castis
I have Telegraf running on all my machines that pipes data to an InfluxDB for
storage and then Grafana for visualizations.

Gives me all sorts of useful information that I use to make decisions.

~~~
unixhero
This is such a cool way of doing monitoring.

~~~
h1d
You need to realize health checking doesn't need to be cool. It's a good
solution to check the statistics by human eyes but there are simpler and more
effective solutions for health checking.

------
jwklemm
If you're looking beyond uptime + certs, we do functional + visual browser
testing at [https://ghostinspector.com/](https://ghostinspector.com/). Lots of
folks use it for monitoring their website or application (in additional to
their CI process). We have a free tier that includes scheduling. [Disclosure:
I'm the founder]

~~~
thestepafter
I have been using Ghost Inspector for awhile now, so far it has been
fantastic. It is really nice to be able to push to a branch and get notified
in Slack a couple seconds later with any issues including screenshots.

~~~
jwklemm
\o/ Really glad to hear that! Reach out if we can ever help with anything.

------
cure
Nagios, in combination with with check_mk 'raw edition' ([https://mathias-
kettner.com/editions.html](https://mathias-kettner.com/editions.html)). The
Nagios configuration is automatically generated via Puppet resource
collection.

SSL certificate expiry is easily checked with a nagios check (use the -C flag
on check_http). If you use Letsencrypt with a client like acmetool
([https://github.com/hlandau/acme](https://github.com/hlandau/acme)) your
certs will never expire. Of course the nagios check is still necessary to
ensure acmetool keeps doing its job!

Domain name expiry checking could also be a nagios job, or alternatively you
could write a small script that checks whois output and execute it regularly
with cron.

Configuring your registrar for auto-renewal helps avoid a certain class of
errors ("I forgot to renew!") but not others ("my credit card expired and the
e-mail notifications from my registrar didn't reach me").

~~~
h1d
Is there a reason to pick Nagios today than the reason "I know it well"?

------
exabrial
TICK stack: Telegraf, Influxdb, Chronograf, Kapacitor. They're have a massive
array of plugins that can watch/store/graph/alert on just about anything.

[https://www.influxdata.com/time-series-
platform/](https://www.influxdata.com/time-series-platform/)

~~~
justin_oaks
I agree with this. I just set this up at work and it's really cool. Still in
active development and not fully mature yet, but gets the job done and is
pretty good quality.

I wish the documentation was better but telegraf's documentation is light
years ahead of collectd, which is similar software.

Kapacitor needs some more examples and the default Chronograf-generated
TICKscript needed to be thoroughly modified to meet my needs. It took me way
too long to figure out how to use stateChangesOnly() to prevent me from
getting constant notifications once something went into an alarm state.

That said, the stack works well, even if it has a few rough edges. Thanks to
Influxdata for the open source stuff. High quality open source software makes
me want to endorse them and purchase the paid products.

~~~
exabrial
Agree on the TickScript criticism! Just not enough stack overflow answers to
go around. Btw, stateChangesOnly() can take a time argument which is really
really handy!

------
josefresco
Pingdom, SiteUptime, and Montastic. None of these are relied upon for mission
critical monitoring. I set them up as a sort of "canary in the coal mine" for
each of my servers, to help alert me to issues.

I also created a very hacky, browser start page which has GIF's that are
pulled from each of my servers. Sort of like how "game copy world" used to
setup their mirror page. Super unsophisticated but it allows me to do an
"uptime check" every time I open my browser.

------
jozi9
I see that it's trending but without any comments - so allow me a shameless
plug, I created a tool to monitor my APIs (can schedule calls, do response
content checks, send alerts etc):
[http://www.apilope.com](http://www.apilope.com)

If you drop me a line after you signed up I can flag you as a demo user that's
free forever - or at least until you want to pay or cancel :)

------
mvanbaak
Amazon route53 checks for monitoring. Cert renew either aws certificate
manager or letsencrypt(depending on the usecase) so it’s handled automatically

------
mikedh
I run Selenium integration tests inside a docker container every 5 minutes or
so and attach the results to a sentry.io logger. I put up a boilerplate
version on github: [https://github.com/mikedh/selenium-
simple](https://github.com/mikedh/selenium-simple)

~~~
tnolet
This is almost literally what I used as an inspiration for the site
transaction monitoring part of [https://checklyhq.com](https://checklyhq.com),
as in a browser emulation feeding a monitoring system at regular intervals.
Selenium was always a bit of hog so I jumped on Puppeteer when it came around.

------
tomspeak
I use Ping [0], never used an alternative so not sure how it compares on
features, but it has everything I need: uptime alerts, header & body requests.
All packaged in a nice interface with a solid pricing structure.

[0] [https://apex.sh/ping/](https://apex.sh/ping/)

------
jimsmart
Answering the second part of your question:—

Re SSL expiry: we've taken the route of running VirtualMin on our 'commodity'
servers, sites setup in there have an auto-renew policy if they use SSL, so it
requires no action and auto-updates whenever certs expire (IIRC it's every
three months? but I might be wrong: it requires no thought nor action from
me).

For domain names, we use Joker (Swiss company), we've found them to be good in
all aspects, and they send renewal notices well in advance (including a link
that anyone can use to renew the domain, even without having an account with
them).

Uptime monitoring is a whole different kettle of fish, and we manage it on a
per client basis depending on needs. For the sites we host, generally if the
server's up, then the sites are up — but we also do more specific monitoring
if it's a client requirement.

------
marcrosoft
uptimerobot.com Use automatic let's encrypt certs and forget about it

~~~
mtarnovan
I'm using the same, but I'm looking around for alternatives after I received a
couple of false alerts (about 3-4 over the last year) from them. Still,
overall a good service.

------
WildGreenLeave
I was unable to find a solution that was able to monitor my websites and
servers using a single service, so I decided to develop something[0] to
scratch my own itch. It has been open for almost a year now and it is slowly
gaining some traction. :) It allows you to monitor your server for resources,
your websites for the uptime, SSL certificate, broken links and mixed contents
errors. It also has a public status page for you to use. Also the setup is
extremely easy as it is just a single binary (and it is open source!) to run
using your own crontab.

Feel free to let me know what you think about it, feedback is always greatly
appreciated.

Disclaimer: I'm the developer.

[0] [https://servitor.io](https://servitor.io)

------
corobo
I've written custom bash scripts that Zabbix runs and alerts based on their
output.

For example, domain expiry - I have a script that Zabbix runs once a day that
does a whois and grabs the expiry date for the domain in question. Convert
that to a unix string and subtract the current date on it. Echo that from the
script.

In Zabbix we can now alert if that item's value is <30d. Similar for SSL
certificates and the web monitoring stuff is built in

Edit: Oh actually on the whois, I remember it was a huge pain in the butt
getting the expiry for a variety of different domains - I now use
[https://jsonwhoisapi.com/](https://jsonwhoisapi.com/) to get the whois info

------
8ytecoder
External monitoring: Pingdom/Site24x7. Lesson learnt - have the alerts route
to at least a few emails outside of your company domain if you use the same
domain for email.

Site monitoring: NewRelic

PagerDuty/OpsGenie: For alert routing if you have more than 2 people.

------
scosman
\- Custom health check endpoint: checks a bunch of internal status metrics
(NewRelic metrics for speed and error rate, DB connection check, a few E2E
tests). Returns a simple overall status: OK, Warning (error rate high but not
critical, slower than usual response times), Critical (DB down, errors high,
anything else). Benefit of this approach is that it's easy to add new health
checks any time you add a feature (example: is redis up?).

\- Pingdom: polls custom endpoint every minute. Sends notification to
PagerDuty if critical, or if warning for more than 5 mins.

\- PagerDuty to notify team

\- SSL Expiry: calendar notifications (whole team) and reminder emails from
our SSL cert issuer

------
quizme2000
I started using Site 24x7 from Zoho about a year ago for all of my apps and my
clients "basic" websites.

It ended up being a great tool for me because it allows for the most basic
ping test and to content checks to be setup in 10 mins. It also has the
ability to add reporting based on apps, servers, and databases. The aws add-
ins helped me tune my usage, as i was paying way to much for a couple services
that i could downgrade with out impacting my apps.

I feel it is priced right and a good value up to the 89/month plan.

------
tnolet
People in the market for API monitoring and site transaction monitoring, have
a look at [https://checklyhq.com](https://checklyhq.com). We offer full API
monitoring and add Puppeteer-based site transaction monitoring. We aim to be a
one-stop-shop, so we give you a big dashboard and nice things like SSL certs
monitoring and SMS alerting.

Full disclosure: I'm the founder.

------
DougN7
I use PA Server Monitor. The Web Page monitor can check a URL, submitting form
data if configured, and check for text on the returned page.

It can also alert if the SSL cert is within X days of expiring.

[https://www.poweradmin.com/help/pa-server-
monitor-7-1/monito...](https://www.poweradmin.com/help/pa-server-
monitor-7-1/monitor_web_page.aspx)

------
reefoctopus
I have uptimerobot.com send me an email when it’s down. That email is
forwarded to my cell phone via text using [phonenumber]@txt.att.net.

------
romdev
Many of our larger clients use
[https://www.splunk.com](https://www.splunk.com) for enterprise-level
monitoring, alerting, and log analysis. It's proven effective to debug issues
in environments where I don't have direct log access.

------
lrpublic
[https://nodeping.com](https://nodeping.com) \- cheap, has an api, can graph a
simple JSON response in addition to ICMP, http etc. Generally reliable and
we’ve been using it as our backstop monitoring for hundreds of nodes for
several years.

------
hkchad
DataDog, our API is 100% serverless microservices (aws lambda) and 90% of them
connect to elasticsearch / dynamo. If we start getting high error rates an
alarm goes off and slack / email lights up. We are monitoring upwards of 300
lambda's this way.

------
jamieweb
KeyChest.net is great for managing your TLS certificates.

It can use the certificate transparency (CT) logs to detect new certificates
for your domain, so once set up you don't have to maintain it. Make sure to
enable the weekly report email too.

------
ArtWomb
Virtually every monitoring tool has already been mentioned. But I just want to
add that you can always get CollectD and StatsD up and running in seconds for
free on linux hosts. Very lightweight, and can measure virtually anything.

------
ajawee
For uptime montioring: [https://uptimerobot.com](https://uptimerobot.com)

For backend and frontend monitoring: [https://atatus.com](https://atatus.com)

------
petecooper
Not websites as such, but I tend to monitor the underlying web servers and
services with netdata.

[https://github.com/netdata/netdata](https://github.com/netdata/netdata)

------
shanecleveland
I have very simple needs in this regard, so a very simple tool does the trick
and does it very well: [https://servercheck.in](https://servercheck.in)

------
jgallias
[https://hetrixtools.com](https://hetrixtools.com) and
[https://watchful.li](https://watchful.li)

------
cascada
For my needs -- several websites -- a free solution and simple solution will
be enough. Pings a few times per day would do. Even once per day might do.

What are free solutions?

One is Google Docs. What else?

------
rorykoehler
New Relic is great for APM, uptime etc

SSL & domain name on auto-renew. SSL via lets encrypt with a renewal cron and
domain is set to auto-renew in domain registrar dashboard.

------
marlinsearch
[https://www.statuscake.com/](https://www.statuscake.com/) free account has
been working fine for me.

------
aequitas
I discovered earlier this week that Keybase.io also offers pseudo monitoring
by emailing me the proof on my website was no longer valid due to broken
https.

------
panda888888
We have everything set to give us email notifications and have a group
(softwareadmin@company.com) that the emails go to.

------
nrjames
We use Monit

[https://mmonit.com/monit/](https://mmonit.com/monit/)

~~~
h1d
Love its simple config language and ease of setup.

------
sparrish
NodePing has powerful monitoring for web app, APIs, and websockets. Checks for
SSL and domain expiry as well.

~~~
Fileformat
I recommend NodePing as well. [https://nodeping.com/](https://nodeping.com/)

A ton of different types of checks. A lot of value for not a lot of money. The
public status page style is minimalastic but exactly what I want.

Example: [https://status.regexplanet.com/](https://status.regexplanet.com/)

------
utternerd
Along with all our other services, they're monitored by our off-prem Icinga
instance.

------
z3t4
I installed a node.js module that take a screens-shot of a web page, a module
that compares two images, and a module that sends e-mails. Glued it together,
put it on a 1$/month VPS. I now get an e-mail every time a web page that I'm
interested in changes.

~~~
elorant
what's the name of the module that takes screenshots?

~~~
gyaru
Wild guess, maybe using puppeteer and it's page.screenshot.

[https://github.com/GoogleChrome/puppeteer/blob/v1.9.0/docs/a...](https://github.com/GoogleChrome/puppeteer/blob/v1.9.0/docs/api.md#pagescreenshotoptions)

------
cozuya
I get pinged through a discord webhook when my site crashes (restarts).

------
jdlyga
I have a super tiny personal website run on a VPS. I use uptimerobot.

------
yandexbaidu
We wait until the CIO calls since he gets all of the alerts anyway.

------
xtralife
You can try WebGazer. It's a very user-friendly website monitoring tool and
its cheap. I didn't get any false-alarms.

[https://www.webgazer.io/](https://www.webgazer.io/)

------
dddw
keychest.net for ssl expires great free service, please consider donating to
it!

------
vax425
Montastic. Simple. $5.

------
elcomet
I use datadog

------
shawn
[https://updown.io/](https://updown.io/)

Best value of any tool I've ever used. It does literally everything you asked.
I didn't even know it checked SSL expiry till it pinged me.

~~~
simlevesque
UpDown is nice but I'm considering moving because they don't support multiple
domains on the same status page. I don't need a single status page for each of
the domains I own, that would be ridiculous.

------
wpmoradi
I am curious to see other people's solutions.

