Check out pandoc. With it you could convert your documents into Markdown. Then use common *nix text-utils to merge the parts you want into a single file and then load it to InDesign. Check out this article (http://rhythmus.be/md2indd/) that inspired me to automate protocol specs extraction from Word documents (bluurgh).
Nothing comes to mind but perhaps you could find inspiration from the competition section of kaggle (https://www.kaggle.com/competitions). At the moment there are 170 including completed ones.