
Ask HN: Do you monitor your REST APIs? - googlycooly
I have a lot of APIs for different apps that I&#x27;ve built for customers.<p>And most of the time it is when the customer calls me I get to know that the API is down!<p>Maybe after some data changes, or some random code changes.<p>I&#x27;m now having nightmares about API services going down.<p>Do you continuously monitor your APIs and get notified when something goes down? (It&#x27;s not just like monitoring a website using Pingdom, but the actual data responses, like for example check if a particular JSON field exists in the response)
======
davismwfl
Yes, I for every API I build I setup a few specific heartbeat style endpoints.
I have done this since I was a consultant and even now I do it within the
company I am at now.

1\. Heartbeat, which checks the service is responding. Used for HTTP
monitoring services to make sure the route is up mainly.

2\. Heartbeat which checks the service is up and validates all database
connections are up and I can get data from the database (usually a trivial
query on a small table). Used as the primary detection for internal failures.

3\. For any 3rd party services I depend on I setup a heartbeat endpoint for
them that will check the service is up (but not necessarily giving me good
data). Usually I group them, sometimes I group and separate them under like
/heartbeat/services, /heartbeat/service1, /heartbeat/service2. Sometimes you
can validate the service is returning good data but not all the time is it
easy to do that, so I do what I can.

4\. I setup a 3rd party service to monitor the heartbeats and the return code
to validate they are up and properly returning what I expect, notify me if
not. I don't have to do sophisticated response processing at the 3rd party
service because I can just use http return codes 99% of the time. The detailed
response checking is done at the heartbeat level, then a response code
generated. And of course, any failure to respond shows too.

This is still not perfect, but it has proven to make sure we know before
anyone else when something fails. I still have one product that we haven't
converted to this process right now but we are migrating to a new version that
has these checks so it will help me sleep better the faster that happens.

One key thing is don't make the check interval too crazy, the general http is
used a lot for the load balancers, but the others are spread out a lot more to
reduce creating artificial load. When we build an independent service
(microservice etc) I make sure they have these same checks, although it might
not be http based. But since they have the same basic methodology a service
watcher can remove any instance from the registry if a check fails after some
configured number of failures & retries etc.

*edit a few words

~~~
googlycooly
Sounds good. Do you use any platform to setup this monitor?

~~~
davismwfl
I have used lots of different ones over the years, right now I am using one I
don't want to mention simply because we won't be staying on it ourselves (and
I won't recommend something I won't use).

That said, there are lots of services that do it well, the key of course is
the service itself has to be reputable and solid, not knocking anyones
homespun version, but your monitoring is only as reliable as their service.
This is why we are going to move ourselves again.

For one I have used in the past, "uptime" worked well for a couple of my
clients, was reliable and stable.

------
futhey
Low-tech / small-scale solution: Similar to what you're doing, UptimeRobot
lets us monitor and alert on status codes for free, which works for a lot of
simple APIs. I also write a few simple tests for my most important API routes
(sort of like a heartbeat or a self-check / test) that return a 500 on failure
(when health of the actual API might not be surfaced by a simpler test). 50
"tests" for free goes pretty far.

I also tried a product recently that I really liked,
[https://checklyhq.com/](https://checklyhq.com/) \-- They'll give you more
advanced ways of vetting your API responses from multiple locations (along
with averaging request time and monitoring that).

~~~
tnolet
Checkly founder here! We are a dedicated API monitoring solution mixed with
synthetic monitoring using Puppeteer scripts. A bit like if RunScope,
GhostInspector and Pingdom had a baby. And the baby was born in a Javascript,
Cloud Native, DevOps world...

As the parent mentions, we give you way more advanced tools to customize,
monitor and alert than many simpler products like Setup/Teardown scripts,
Puppeteer, GitHub, Prometheus and other integrations.

------
kccqzy
Yes and I find it helpful to have both black box monitoring and white box
monitoring with my previous experience.

For black box monitoring we just set up a prober that runs periodically and
sends requests. It then checks responses to see if they are what is expected.
Bonus if you place multiple such probers across the globe and that also
exercises your load balancing and tests the geographic replication of your
services.

For white box monitoring we instrumented the code itself to export information
about events and metrics. For example, application-level things like the
metadata of each request and response, response status, time to generate the
response, internal errors encountered; system-level things like memory
allocation and CPU time for the container; and dependencies like database
query times, and the durations and statuses of external requests, etc. We used
[http://riemann.io/](http://riemann.io/) to collect and process these streams
and set up alerts. I find it really powerful to adopt this paradigm where
streams of data are exported from your app and processed externally; though
getting used to the stream processing mentality could be something extra to
learn.

------
time0ut
My general approach is to create monitors (in something like Splunk or ELK)
that watch logs and fire alerts (email, SMS, PagerDuty, etc) if their
conditions are met.

I create monitors for health issues like watching for out of memory or pod
failures. I create monitors that compute the error rate and trend for each
endpoint and alert if it crosses a threshold. Similarly, I'll create monitors
for dead letter queues or email send failures or anything else that might go
wrong in an app.

This may sound like a lot of monitors, but I try to log things in common ways,
so a handful of monitors can watch hundreds of endpoints or queues.

Finally, for complicated mission critical systems, I build in support for
synthetic transactions that avoid undesired side effects. These may generate
extra trace logs in the app. Such requests are submitted on a regular schedule
and the input and output logged. Then I build more monitors on these logs.

------
janober
I use [https://n8n.io](https://n8n.io) (full disclosure, I am the creator) for
it. It is free and fair-code licensed. I did not write it esp. for this use
case but have to say works very great.

------
zwetan
Yep, I use Google Analytics

server-side you can track using the measurement protocol

few years ago I did a prototype test with PHP, example here
[https://pastebin.com/PQCRcJXq](https://pastebin.com/PQCRcJXq)

with something like Slim PHP you can add a middleware and automatically track
everything, but you can also customize on a needed basis

I use the same logic with different PL on different backends etc.

for a starter it is cheap to implement and put in place, and cover almost
everything

------
vivekf
We have built into a common layer in all our APIs to record the HTTP status
code it is returning to a redis counter . We have a monitor job that runs
every 1 minute checking the error % ( 200 vs others) and raise an alert when
the threshold is exceeded. This way we get to know api failure errors and
potential security issues such as http 403 returned %.

We also monitor the % requests logged every minute and if that drops by say
50% we know something is down.

------
nitwit005
What I've done in the past is build a library for functional testing of the
API. You can use that library writing functional tests, and to create an API
status test that runs periodically to provide monitoring.

This generally does require that your APIs have some idea of multi-tenancy, as
you don't want your tests modifying some customer's data.

------
googlycooly
Or is it like, I'm missing an intermediate step that will solve this
"monitoring" problem for me?

------
pieterhg
Yep I use [http://uptimerobot.com](http://uptimerobot.com) for it. It's mostly
just keyword alerts where it expects a certain keyword is in the reply. If
there isn't, it's probably down and it alerts me. The alerts I get via
Telegram.

------
jozi9
Shameless plug but I created an app exactly for this purpose:
[https://www.apilope.com](https://www.apilope.com)

You can schedule test flows and validate responses as well.

Edit: my cert is expired, I’ll fix this (now that I have a lot of time to
spare:)

~~~
purrcat259
FYI I got an SSL Cert expired error when accessing the url you provided.

------
nreece
One of our APIs is powered by a cloud function that we monitor (and keep it
warm to avoid cold start time) using
[https://www.statuscake.com](https://www.statuscake.com)

