
Ask HN: Why did Microsoft not use HTML instead of .doc as Word doc format? - bluecat22
If one were to write a word processor in 2018 from scratch, should they use HTML as the document format over .doc or anything else? Considering each browser is a free document viewer?
In other words, are there any big technical differences which make .doc better than HTML as a  document format or vice versa?
======
zhte415
Because when you design something, you design for different priorities,
implicit or explicit.

.doc was supported / co-existed with .rtf and .txt at the beginning of Word.

Designing for interoperability, .txt and .rtf were good enough. Mac as the
other major OS had an MS Office suite. Interoperability/importing WordPerfect
was important.

Designing for backwards compatibility was a necessary for Microsoft supporting
different OSs and legacy personal and business.

Designing for file size was very important. Most people that use Word don't
use styles, so your HTML will be filled with inline CSS (and that didn't exist
at the time), and filesize would definitely be impacted.

Designing for the page was important. Word was about printed / print-like
documents. HTML struggles to do today what Word and other word processors have
done for decades and continue to iterate on.

Designing for user experience really catapulted Word above competition. I used
and liked WordPerfect but it was a blue screen DOS application. When I saw
Word, I had WYSIWYG! And I had copy-paste. There was a time when copy-paste
was a new thing for many users not in the *NIX/OS2/Amiga/Atari/BoOS world. (A
year later I discovered Unix, and that's key also: design for discoverable
features and platforms.)

So, what are you designing for? Figure that out, then pick your implementation
method.

------
mockindignant
A few answers here:
[https://news.ycombinator.com/item?id=17949299](https://news.ycombinator.com/item?id=17949299)

~~~
mtmail
And the same question 14 days ago.
[https://news.ycombinator.com/item?id=18046382](https://news.ycombinator.com/item?id=18046382)

------
Someone
One aspect where .doc beats .html is in the ability to rapidly write small
changes to disk. Open a document, edit a few characters near the start, and
save. .html would have to write the full document out to disk; .doc could
write less than a kilobyte.

That was extremely useful in a time where programs crashed all the time,
necessitating frequent saving, and floppy disks, with write speeds in the
order of 50 kilobyte per second, were the main storage medium.

This feature lives on in Word as “fast saving”, but can be disabled.

------
mockindignant
The doc format predates HTML by a decade or thereabouts.

~~~
bluecat22
Is it technically superior/inferior to HTML as a document format for a word
processor?

~~~
savethefuture
As long as you know how to access the data you store, it is mostly irrelevant
what format it is in. Bytes are bytes, but less bytes means smaller files.

------
wesnerm2
Microsoft did develop an HTML office format in Office 2000 for Word, Excel,
and Powerpoint designed to be a replacement for the binary formats. It
included extensive embedded XML. Users and companies still used the original.
With Word html, people wanted all the Office specific code stripped out.
Internet explorer was necessary if you wanted accurate rendering.

I was a developer in the Excel group in 2000.

~~~
bluecat22
So there is no technical limitation of HTML itself compared to XML (doc
format)? That HTML would be a just as good replacement for XML (.doc format)
if someone tried to make a word processor for it?

~~~
wesnerm2
The HTML document alternative was a massive technical hack. It was flawed
mandate from executives in order to maintain relevance in web-based world.
Embedded XML and custom styles were used to implemented the format. Also, the
IE team worked closely with Office to support richer text formatting. HTML was
not enough. Even then, the format still could not support all the features in
the product--features like versioning, simultaneous editing, ole embedding,
programmability, etc.

------
navjack27
also what you are saying doesn't make any sense. word itself is the document
editor. the format just stores the metadata and actual data. word takes that
and turns it into the document you see. it almost seems like you are asking
each doc file to be a standalone document and editor in one file.

~~~
bluecat22
I meant if someone were to create a word processor in 2018, with the least
amount of work, shouldn't they use HTML as the format of the document?
Assuming no need to backwards compatibility with any other doc formats.

------
masonic
For one thing, it was created as a proprietary format that predates HTML.

------
navjack27
XML...

