That was the first thing I checked, and it looks like they’re using some existing python package to parse docx files. I wonder if they contributed to it or vetted it strongly
Looking at the code, it looks like they used existing Python packages to read and parse MS Office formats, not what I expected, seeing that the repo is in Microsoft's org on GitHub I expected them to have used Microsoft's "official" libraries for parsing these formats, through Component Object Model (COM).
They used Mammoth for docx (Word) [1][2]
Python-pptx for ppt (PowerPoint) [3][4]
and Pandas for XSLX (Excel) [5]