If anyone is interested in similar challenges, but in cryo electron microscopy, where the datasets are dozens of TB, this is a good paper discussing the file formats, header organization, and bit-depth required for proper resolution:
I reverse engineered several microscopy formats, it can be a lot of fun. If anyone is interested in contributing to similar such efforts there is also https://openslide.org/
Sheesh openslide was an absolute game changer when I was writing code to be able to utilize digital pathology and microscopy data in a browser. They've done some awesome work. I really enjoyed learning about the different formats and can't recommend the resources at OpenSlide enough.
These are both extremely impressive projects. I've used an earlier version of the OpenFlexure stage for an unrelated project, and it was a very easy build. Very well thought out.
https://pubmed.ncbi.nlm.nih.gov/35724904/
What I can say is that TEM data management is unmitigated hell, and thank God for that author (Ludtke) and his software (EMAN2).
Biologists are unsurprisingly awful at data management.