"During the initial recovery, some of our nodes ran out of disk space. It's unclear why this happened since our cluster was only operating at 67% utilization before the initial event, but it's believed this is related to the high load and old Java version. The elasticsearch team continues to investigate to understand the exact circumstances."

This doesn't pass the smell test for me. Not that I know tons about ElasticSearch, but couldn't disk space have been consumed by failed replication attempts?

