I think there are two kinds of logging that's conflated into two in the industry...

jedberg · on Sept 28, 2015

You're absolutely right. I was only talking about DevOps logging (should have made that clearer). Logging for data science is a totally different ball game.

nostrademons · on Sept 28, 2015

Does that change your opinion about having a centralized log data store? If you need it anyway for your data scientists, why not give your security/devops people access to it when they need to debug a problem?

jedberg · on Sept 28, 2015

It doesn't change my opinion because the logging for data scientists is different than for DevOps. For data scientists I assume it would be all application information going into either a queue or stream processor, or being inserted directly into a database, or being pulled out of a database during ETL.

Stuff going to syslog isn't generally going to be used for data science.

a3voices · on Sept 28, 2015

One alternative is to replicate some of your data into a different DB that is safe for the data scientists to use. And that way, they have raw data to play with instead of logs.

nostrademons · on Sept 28, 2015

Curious what you mean by "some of your data"? What kinds of data? Usually, I think of the logs as the raw data, everything else is derived data & analysis.

a3voices · on Sept 28, 2015

Well it would be data from your databases, and anything they would find useful excluding private user information.