
Ask HN: DOCX to Markdown and Markdown to DOCX - tmaly
I am looking for solution that would allow me to convert DOCX to Markdown and also the reverse.  I would like to be able to support embedded pictures as well as tables.  Is there such a system?<p>My ultimate goal is to be able to track the documents as Markdown in a git repository.<p>I came across this<p>https:&#x2F;&#x2F;github.com&#x2F;benweet&#x2F;stackedit<p>which has some conversion from Markdown to PDF but I do not see PDF to Markdown.
======
brudgers
A .docx file is a zip containing a one or more .xml files that describe the
document. I once had the idea of of giving meaningful names to images in
one...after all, it's xml and xml is just text and supposed to be human
readable. It isn't. Word documents are rather complex, that's where the "one
or more .xml files" comes in. It's why Microsoft built its own open standard
back in the aughts when Sun was arguing that OpenDoc was good enough, i.e.
Microsoft would have had to remove functionality from Office to implement it.

Which is to give some background on why .docx -> .md -> .docx might not do
what users want it to do, if what users want at the end is what they put in at
the beginning or something close to it. Markdown simply does not contain as
much information as .docx. Even "only" going to .html will probably be a mess:
last time I dealt with an html file generated from Word, it was full of local
markup <span>'s, but it might have become better since it's been a while.

Anyway, if I were trying to version track .docx files, I'd probably do
something like unzip -> commit -> zip and forget about trying to read diffs in
the repository.

Good luck.

------
ukz
Pandoc supports conversion from DOCX to Markdown and vice versa:
[http://pandoc.org/](http://pandoc.org/)

~~~
tmaly
Have you tried it with embedded pictures and tables in DOCX?

