If you are writing server software, I recommend Michael Nygard's book "Release It!: Design and Deploy Production-Ready Software". Measuring event loop lag sounds like Nygard's "Circuit Breaker" pattern to avoid cascading failures.

The book's examples and text are all Java, but the lessons are applicable anywhere. He offers many scalability patterns (resource pools) and anti-patterns (runaway log files) with interesting stories from his experience debugging real systems. I especially liked his story about debugging a crash in an Oracle DB driver that caused unexpected Java exceptions to be thrown from java.sql.Statement.close(), which quickly blocked a DB connection pool.


