You could use Libreoffice's command line interface to convert from .doc to a more manageable format.

  lowriter --convert-to odt some-document.doc
odt is not the only supported target, but doc --libreoffice--> odt --pandoc--> plain seems to give better results than e.g. doc --libreoffice--> txt or doc --libreoffice--> docx --pandoc--> plain.

if that's the case, i'll stick with catdoc. my use case is to create a full text search index of the content, trading libre office cli for catdoc, i'd rather just stick with catdoc, but thanks.

