
Ask HN: Can ML/AI make system monitoring better? - ashtavakra
If someone with enough experience in ML&#x2F;AI were to build an application that aims to 1) Reduce the overall number of alerts 2) Figure out which alerts are actionable and which are not 3) Predict potential issues and suggest remedies - would he&#x2F;she succeed? The question is to find whether someone can actually achieve this with ML&#x2F;AI in its current state today. If something like this was possible is it safe to assume that engineering teams at Google, Amazon, Uber, AirBnB, DropBox, NetFlix would have already implemented this?<p>I read about two different views on this subject today<p>1. Why Machine Learning in Monitoring is BS: https:&#x2F;&#x2F;blog.opsee.com&#x2F;machine-learning-in-monitoring-is-bs-134e362faee2<p>2. Why it is not BS: http:&#x2F;&#x2F;mabrek.github.io&#x2F;blog&#x2F;machine-learning-is-not-bs&#x2F;<p>I was curious to find what good people on HN think about this.
======
PaulHoule
It is a tough problem no doubt, largely because of the unbalanced sample size.

People who work on anomaly detection in finance (anti-fraud) usually look at
it as a "characterize a normal transaction and reject anything that is far
from the center" problem as opposed to a "classify transactions as good or
bad" problem.

Would someone succeed or fail? I think it could go either way. Are you talking
about the general problem or the problem for some particular environment? (ex.
Netflix certainly does not need to solve the general problem)

------
solomatov
From what I know, tt would work.

Google uses DNN to optimize power consumption in their data centers:
[https://deepmind.com/blog/deepmind-ai-reduces-google-data-
ce...](https://deepmind.com/blog/deepmind-ai-reduces-google-data-centre-
cooling-bill-40/)

------
chamoda
Anomaly detection is one use case. Unsupervised learning methods can put to
good use of analysing large logs utilizing abnormal cpu usage, user activity,
web traffic etc.

------
thevivekpandey
I believe it is not BS. However, currently there are not good tools which
solve monitoring using AI in an elegant way.

