It’s not claiming scientific precision but more of an aggregated risk indicator.
Improving explainability of that score is high on the list. I hope it helps :)
It’s more of an engineering diagnostic heuristic than a psychological metric.
The goal is practical usefulness, not theoretical rigor.
Curious to hear:
• How are people currently debugging agent continuity issues?
• Are you seeing users re-explain things often?
Also happy to analyze any sample transcripts if people want to try it.
It’s not claiming scientific precision but more of an aggregated risk indicator.
Improving explainability of that score is high on the list. I hope it helps :)