The goal is to have more confidence in deployed code. More specifically, it's to detect failures hard to find - anything that "fails", but doesn't crash.
The concept itself is pretty simple, when implementing a new feature:
1. log major steps of the feature (python's logging, ruby's logger, etc.)
2. Then write a simple validation scenario that "scans" the logs and searches for main logging entries of your feature (beginning, end, etc.).
3. Tell this hypothetical library/platform to analyze logs with those validation scenarios
Example of scenario concept:
- If you find "User 245 signed up" in logs
- Not later than 30 seconds
- then you should also find "User 245 - sent welcome email"
Concept sounds interesting to me because:
- it tackles issues that are difficult to monitor currently (failures that DONT crash)
- it's useful both pre-deployment and post-deployment (less effort for testing)
- it can be used to instrument integration/e2e/manual tests
- it works for app of any scale or architecture (monolithic or 1000 microservices) because logs can be aggregated to a single stream
- it's helping to ensure business-relevant features or user-level features work OK at scale
Any downsides to this idea ?
I made a landing to detail the idea a bit more: logscan.io (don't know if I'm allowed to post it here, will remove it otherwise)