
Ask HN: Whats features do you need for uptime monitoring of 50 endpoints? - strobe
If you are a user of uptime monitoring service and using more than 50 monitors, what kind of service characteristics is critical to choose a service provider for that?
======
sethammons
The number of endpoints is not something I think about. One service I run has
two endpoints, another has several hundred, another reads from a queue.

There are standard metrics we monitor and it is more than a heartbeat or
health check endpoint with a status for each dependency. We monitor success
and error counts, counts of response codes, cache hit miss ratio, latency,
time spent on networked resources, time spent doing complex computation, load
balance between regions, queue depths, and then specific meta data like user
id, payload details like types of parameters used, size of requests, method of
authentication, user agent, etc.

The key here is the number of endpoints is not interesting. We just use a
label and filter on that. What is interesting is how the metrics can scale,
requests per second, higher cardinality labels, data aggregation over time,
retention time, the ability to set alerts, trend analysis that can alert if
this Tuesday morning's graphs are odd compared to other Tuesday mornings,
handling math functions like derivative and sums and percentiles, etc.

If you are purely looking for what is important for an uptime service, the
minimum is that it alerts me if a heartbeat fails for too long. But I would
only use such on a hobby project. If an endpoint is in production, I want all
the metrics I mentioned earlier as a minimum.

~~~
strobe
Thanks for the reply.

I just wondering why uptime monitoring Saas services have a pricing model
about the count of monitors (25, 50, etc). I got that they trying to separate
hobby users from companies but for me sounds more reasonable to have tiers
about a count of servers, IPs, projects (limited by some resources cap) or
tiers about additional features like grpc, graphql support, alerts filtering,
etc.

Also, if you have a serious project usually it makes sense to set up rich
monitoring tools so maybe users of such services have specific cases. Probably
they have something like a lot of simple sites, or need a cover of API
endpoints by monitors but don't wanna use complex tools.

