This is amazing. I hope you'll keep working on it. There's always a long tail of details that need taking care of when trying to cover a large corpus, and ploughing through successive 80%'s is (as you are no doubt acutely aware) serious grunt work. But you've made a fabulous start, so I hope you find the stamina to do it!
Yeah, even building upon Pandoc's LaTeX parsing, 3 months of grunt work got us this 20% working. Over the next 12 months we'll get the other 80% working. :)