
Ask HN: Which tool do you use to monitor your system and application? - rishiloyola
Hello<p>I used famous ELK stack to monitor syslog and trigger alarms. I would like to know about alternative tools maybe language specific or maybe not.
======
tnolet
I’d be silly not to mention what I use at Checkly for production monitoring
and simultaneously harass you guys with a covert product pitch:

\- Infra metrics & graphing (AWS, Heroku and custom metrics) on AppOptics
(formerly Librato)

\- Some metrics on Prometheus

\- Logs on Papertrail

\- Some AWS native Cloudwatch alerts

\- Sentry for backend error tracking

\- Cronhub for a single background job

\- API monitoring with our own product

\- Synthetic monitoring using Puppeteer scripts with our own product.

I love Datadog in general but it’s too expensive at this stage.

Full disclaimer: founder at monitoring SaaS
[https://checklyhq.com](https://checklyhq.com)

------
drchaos
Solo SaaS app dev here. I found Sentry[1] to be the most valuable tool for
monitoring what's going on inside my (Django/React) app. Sentry catches every
exception/error in front- and backend, then collects and aggregates them, so I
get a single ticket/notification for every type of issue, instead of thousands
of those for every instance of the same problem.

For me, that's much simpler than trying to log each error and analyze the log
later on. I still create application logs, but only read them if I need more
information about a problem logged by Sentry (most of the time I don't need
that at all, because Sentry collects a lot of context already).

Another plus for Sentry is that it can be self-hosted, which makes GDPR
compliance a little bit easier.

Besides that, I use Monitor Scout[2] for checking that my app is still up, and
fail2ban[3] to get rid of script kiddies trying to brute-force passwords and
stuff.

[1] [https://sentry.io/](https://sentry.io/) [2]
[https://www.monitorscout.com](https://www.monitorscout.com) [3]
[https://www.fail2ban.org/](https://www.fail2ban.org/)

------
spondyl
At Xero, we use Sumo Logic for logging and New Relic for monitoring,
infrastructure tracking and a whole bunch of other things.

We were in a Sumo case study[1] and we've given talks in partnership with New
Relic[2] so you can probably find out more if you're interested

[1] [https://www.sumologic.com/case-
study/xero/](https://www.sumologic.com/case-study/xero/) [2]
[https://www.youtube.com/watch?v=QPIpqx47CCY](https://www.youtube.com/watch?v=QPIpqx47CCY)

------
twunde
It's a fairly crowded space. Two of the newer monitoring stacks are Prometheus
+ Grafana and InfluxDB's Tick stack, both of which are metric based monitoring
system and tend to be used with containers. Older open source systems include
sensu and nagios, which are check-based systems. Sentry and rollbar are
examples of exception/error based monitoring systems. There are a number of
commercial solutions available that fall into one of these categories

------
nodesocket
Big fan of Datadog logs[1] in tandem with alerts[2]. You can push metrics/logs
from cloud provider native resources, containers, or application level via
sdk's.

Of course it is fully managed and paid and at huge scale is probably cost
prohibitive.

    
    
      [1] https://docs.datadoghq.com/logs/
      [2] https://www.datadoghq.com/alerts/

------
dankohn1
Here's a fairly comprehensive list:
[https://landscape.cncf.io/category=monitoring&format=card-
mo...](https://landscape.cncf.io/category=monitoring&format=card-
mode&grouping=category)

------
sethammons
We do a lot with splunk and graphite/grafana. More and more, we are
incorporating sysdig and lightstep. We integrate these systems with pagerduty.

------
shric
We use ELK for logging and Prometheus/Alertmanager/Grafana for
monitoring/alerting/dashboards.

------
aprdm
We use sensu and elk

