Currently, we have no elegant way to achieve what you want.
When failures occur, repair is done through workers that says when they launch, when they repair chunks, and when they exit in the logs. We also have `garage status` and `garage stats`. The first command displays healthy and non healthy nodes, the second one displays the queue length of our tables and chunks, if their values are greater than zero, we are repairing the cluster.
We are documenting failure recovery in our documentation: https://garagehq.deuxfleurs.fr/documentation/cookbook/recove...
Exposing Prometheus metrics is an ongoing work which we haven't had much time to advance yet. For now we can check on replication progress or overall cluster health by inspecting Garage's state from the command line.