I think there are two kinds of logging that's conflated into two in the industry: logging for devops and logging for analytics.
For logging for devops, I 100% agree with you. Looking at application metrics rather than raw logs is far more productive, and the raw logs should only be consulted after you have triaged the situation based on the metrics monitored.
However, there is another kind of logging, and that's for data science and analytics. Here, it's hugely helpful to have centralized logging. Hell, it is a must. The last thing you want is to have data scientists with a shaky Linux knowledge to ssh into your prod machines. At the same time, logs are the best source of customer behavior data to inform product insights, etc. By centralizing these logs and making them available on S3 or HDFS or something, you can point them there and have everyone win.
Among Fluentd users, we definitely see both camps. As a matter of fact, one of the reasons that I think people like Fluentd is that because it enables both monitoring and log aggregation within it.
You're absolutely right. I was only talking about DevOps logging (should have made that clearer). Logging for data science is a totally different ball game.
Does that change your opinion about having a centralized log data store? If you need it anyway for your data scientists, why not give your security/devops people access to it when they need to debug a problem?
It doesn't change my opinion because the logging for data scientists is different than for DevOps. For data scientists I assume it would be all application information going into either a queue or stream processor, or being inserted directly into a database, or being pulled out of a database during ETL.
Stuff going to syslog isn't generally going to be used for data science.
One alternative is to replicate some of your data into a different DB that is safe for the data scientists to use. And that way, they have raw data to play with instead of logs.
Curious what you mean by "some of your data"? What kinds of data? Usually, I think of the logs as the raw data, everything else is derived data & analysis.
For logging for devops, I 100% agree with you. Looking at application metrics rather than raw logs is far more productive, and the raw logs should only be consulted after you have triaged the situation based on the metrics monitored.
However, there is another kind of logging, and that's for data science and analytics. Here, it's hugely helpful to have centralized logging. Hell, it is a must. The last thing you want is to have data scientists with a shaky Linux knowledge to ssh into your prod machines. At the same time, logs are the best source of customer behavior data to inform product insights, etc. By centralizing these logs and making them available on S3 or HDFS or something, you can point them there and have everyone win.
Among Fluentd users, we definitely see both camps. As a matter of fact, one of the reasons that I think people like Fluentd is that because it enables both monitoring and log aggregation within it.