Hacker News new | past | comments | ask | show | jobs | submit login

Tooling requires access. Are you saying no dev should ever `strace` a process? (This requires not only access, but presuming my UID != the service UID for the service, sudo, too.)

Note that I'm not saying devs should have access to every production machine; I'm only saying that access should be granted to devs for what they are responsible for maintaining.

Sure, one can write custom tooling to promote chosen pieces of information out of the system into other, managed systems. E.g., piping logs out to ELK. And we do this. But it often is not sufficient, and production incidents might end up involving information that I did not think to capture at the time I wrote the code.

Certain queries might fail, only on particular data. That data may or may not be logged, and root-causing the failure will necessitate figuring out what that data is.

And it may not be possible to add it to the code at the time of the incident; yes, later one might come back and capture that information in a more formal channel or tool now that one has the benefit of hindsight, but at the time of the outage, priority number one is always to restore the service to a functioning state. Deploying in the middle of that might be risky, or simply make the issue worse, particularly when you do not know what the issue is. (Which, since we're discussing introspecting the system, I think is almost always the part of the outage where you don't yet know what is wrong.)




Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: