Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The immortality of Microsoft Word (versionstory.com)
41 points by jpbryan 6 hours ago | hide | past | favorite | 60 comments




"git doesn't really work ... because docx is a binary blob."

Well, yes, but the binary blob is a zip archive of a directory of text XML files, and one could imagine tooling that wraps the git interaction in an unzip/zip bracket.

The real problem is that lawyers, like basically all other non-programmers, neither know nor care about the sequence of bytes that makes a file in the minds of programmers. In their minds the file IS what they see when they open it in word: a sequence of white rectangles with text laid out on it in specific ways, including tables with borders, etc. The fact that a lot of really complicated stuff goes on inside the file to get the WYSIWYG rendering is not only irrelevant to them, it's unknown.

Maybe the answer here will be along the lines of Karpathy's musings about making LLMs work directly with pixels (images of text), instead of encoded text and tokenizers [1]. An AI tool would take the document visually-standard legal document form, and read it, and produce output with edits, redlines, etc as directed by the user.

[1] https://x.com/karpathy/status/1980397031542989305


Diffing the XML is a complete nonstarter. I've spent years working with the OpenXML format and can assure you it is very complex even for a professional software engineer with 10 years of experience.

The diff of the document (referred to as a "redline") is what lawyers send to the client and their counterparties. It's essential that the redline is legible for all parties and reflects their professionalism.

Moreover, it is not enough to see the structural changes between the versions. A lawyer needs to see the formatting changes between the versions as well which cannot be accomplished by diffing XML files.


And, importantly, there already is an official diff tool: the "Compare" button.

Correct. Solely relying on the built in Word Compare tool results in a whole host of version control issues, however, which I outline in detail in my post "On Building Git for Lawyers."

https://theredline.versionstory.com/p/on-building-git-for-la...


Something I've started doing in my workflow is using Pandoc to convert between Markdown and DOCX when authoring long documents. This lets me put the Markdown into Git and apply the Gemini CLI to it. When referencing other documents, I'll also convert them to MD and drop them into a folder so I can tell the AI to read them and cross-reference things.

At the start of the project the Markdown is authoritative, and the DOCX is just for previewing the styling. (Pandoc can insert the text into a layout template with place holders.)

Towards the end of a project I'll start treating the DOCX as authoritative but continue generating Markdown from it, so I can run the AI over it as a final proof-read or whatever.

This is similar to what people used to do with DocBook, but with a more friendly text format and a more AI-friendly "modern" workflow with Git, etc...


I do this with asciidoc instead of- same advantages with git and llms but you get a tremendous amount more styling and functionality.

It's easy to think that Word's functionality is what you see on the Ribbon, mentally map that to Google Docs, and think that the latter can replace the former. But Word is extremely deep. The templating and style sheets allow for a level of fine grained control that doesn't exist in alternatives. There are features that exist purely for the legal market, like Table of Authorities, and customizable line numbering and hyphenation.

Maybe one day there'll be a product to replace Word, but it won't succeed by claiming to be a generalist replacement but only as a niche product that solves a particularly painful problem for lawyers and then expands over time to capture more use cases.


I completely agree.

On the Google Docs front — I wrote specifically about its viability as a Word successor in an earlier post, "Why Lawyers Will Never Use Google Docs".

https://theredline.versionstory.com/p/why-lawyers-will-never...


I swear Google Docs also used to do a better job of replicating Word's ribbon, and has slowly pruned it of a lot of features that are individually niche, but cumulatively very important.

Word's "ribbon" is a shitshow, and has been since day one.

It's depressing to read about Word's entrenchment. This entire once-great application is now an execrable mess, with menus scattered under cryptic buttons (and abridged into dumbed-down menus that require you to expose yet another, collapsed one to access essential, frequently-used functions), a file... thing (not even a dialog, let alone a proper File dialog) that shows you a canned list of locations in a UI that appears to consist only of text...

The style-handling is even messed up, once one of Word's great strengths.


I will say in favor of the ribbon: it still fully supports KeyTips (a.k.a tapping Alt and then a series of letters to navigate a software menu). So much Electron-based slop software out there doesn't support any sort of keyboard navigation of the application at all.

I do find the ribbon somehow weirdly intuitive for navigating with the keyboard, but it was of course possible to navigate drop-down menus in the exact same way (Alt and a series of underlined letters) for years before that. And still is... When developers bother to write robust software.


This is going to be very true right up until it isn't.

Yeah, I know that sounds fake-deep but we've seen this before; I'm old enough to remember when WordPerfect was the standard that wasn't going anywhere.

It will just be one of those inflection-point thingies.


I don't necessarily disagree with you, but I did want to point out that a big part of what made it possible for Word to displace WordPerfect in the legal world was, literally, the fact that Word implemented full support for WordPerfect's file format including all sorts of weird quirky edge cases.

So, an analogous "Word-killer" today would presumably have to implement all of the docx format's weird quirks etc. On the one hand, the file format is standardized and open, so in principle that should be possible; on the other hand, it's a pretty gnarly file format, with a lot of nooks and crannies. Ironically, I remember hearing once that some of the weirder nooks and crannies of the docx format have their roots in... Word's WordPerfect interoperability features.

And as somebody who recently spent far more time than he expected to trying to reliably get data _out_ of a set of mildly-complicated docx files, I can report that the various fiddly details that the OP notes as being particularly important in the legal domain --- very specific details of paragraph formatting, complex table structures, etc. --- are a huge PITA to deal with when working with the docx format.


Yes, exactly. A successor could theoretically replace Word, but first it needs to replicate all of its existing functionality.

For a competitor to supplant Word, it would need to:

- Be fully backwards compatible with .docx. Lawyers will inevitably receive .docx files from counterparties that they need to review, redline, and mark up. The new processor has to handle everything Word does flawlessly. (As an engineer who has spent considerable time building a high-quality docx comparison engine, I can tell you this is tremendously difficult.)

- If it introduces a new file format, support seamless comparison and conversion between that format and .docx. Not technically impossible, but also tremendously difficult with marginal upside.

- Defeat the Microsoft Office bundle in the market — meaning it either offers enough advantage that organizations pay for both, or it replaces Excel, PowerPoint, and Outlook too.

Given the enormous challenge of building a viable Word competitor and the marginal room for improvement that Microsoft has left on the table, I think it's very unlikely that a competitor will threaten its market position.


For certain legal use cases SaaS is still a non-starter due to security concerns so this hypothetical MS Word competitor would also need a native local application option. I don't think Google is interested in that market.

As the US government becomes more erratic and untrustworthy it will encourage large organisations to look for alternatives to American software and services.

The stated intent of the US National Security Strategy is to destabilise and undermine Europe. That is a big incentive for European organisations to replace Windows, Office, and any other Microsoft service.

Linux and LibreOffice usage will grow as a direct consequence of the US government's new antipathy to Europe.


Imo, as long as companies are paying for E3 licenses, they won't pay for another solution. And they'll be paying Microsoft for licenses as long as they have Active Directory, right? Seems like the whole Microsoft ecosystem is built on AD (and probably Excel too)

Yes, AD is the value proposition. Your employees can get cloud-synced, multi-user real-time editing of documents. This is what kills the "but my Linux app can do it for free."

It runs on-premise, has all kinds of certificates and has a history of half a cenutry (give or take.) That kills Google Docs.

It's cross-paltform, killing whatever Apple thinks it has.

Too many people think Word is a text editor. I'd use Notepad++ if it had full AD integration. But it doesnt.


French and German governments are working on an alternative to this: https://docs.numerique.gouv.fr/home/

They also have Grist, an Airtable replacement.

Which I see this "suite numerique" integrates as well.


> And they'll be paying Microsoft for licenses as long as they have Active Directory, right?

They'll be paying long beyond on-prem AD as well. EntraID is becoming the new identity system. If you're already on E3/E5, you might as well make use of it, and making most use of it means being stuck in the whole Microsoft ecosystem.

Why bother looking for alternatives, even if one particular product might be better, when Microsoft gives you literally everything at at least a mediocre level, for one price and pre-integrated.


>Why bother looking for alternatives, even if one particular product might be better, when Microsoft gives you literally everything at at least a mediocre level, for one price and pre-integrated.

This is exactly why we switched from Zoom to Teams


If you keep linking enough problematic options in an "all or nothing" package, at some point people flip to the other choice you are giving them.

It's looking like Windows will be more of an issue here than anything in Office. But either way they can only push people so far.


> It's looking like Windows will be more of an issue here than anything in Office.

And even then, Microsoft will be happy because even if Windows were to dissappear tomorrow, people would still be buying Microsoft 365 licenses and just using the very same tech and app stack from their mac.


Yes. Wordperfect was the favorite of lawyers for a long time.

I'm surprised Google Docs doesn't support all the features lawyers need by now. Seems like a market they'd want to go after, and their .docx conversion seems decent enough for basic formatting, tables, etc.

Curious what the top 3 features are that are missing. The article only mentions multi-level decimal clause numbering (e.g. 9.1.2). Seems like it would be a very easy feature to add. I've heard that line numbering is also a big legal thing, but Docs already has that.


I actually wrote a detailed breakdown of why Google Docs doesn't meet lawyers' needs!

https://theredline.versionstory.com/p/why-lawyers-will-never...

The short answer is Google Docs:

- Requires all-or-nothing adoption which is a non-starter for law-firms

- Does not support commit atomicity

- Does not store a comprehensive history of the document


It's also not nearly as scriptable as Word is. Word has had macros ("fields") since its first Windows versions, VBA for over 20 years now, it's easy to develop complex add-ons - where I live we've had one for grammar checking for decades now (speaking of that, Google Docs' language features for less popular languages are far behind Word's). Various software supports export to Word and some programs even import from it. You'd be surprised what levels of automation has been achieved with Word.

Files are also easily shared (on physical media, email, no need for anyone to have a Google account to edit and send them back), encrypted, burned onto a CD for storage. DOC/DOCX are ubiquitous and stable file formats. No worries about data leaks in the cloud as it's all local by default...


Correct. Lawyers load their Word instances up with many add-ins specific to their practice. Microsoft Word add-ins are the entire product surface for many legal tech companies.

It's somewhat analogous to how coders use add-ins in their IDE but if only one IDE could run them.


Oh, that makes tons of sense. Yeah, that would basically mean a switch to anything would never happen, I definitely see that. Thank you!

Amazing thanks!

So as far as formatting goes, it seems like it's only list formatting and small caps you've identified, am I missing anything else? (I am baffled by Docs' refusal to add small caps.)

But then as far as workflow is concerned, I'm not sure Docs is as unusable as you say it is -- the commit atomicity and comprehensive history aren't supported by Word either, are they? That's just a function of maintaining 20 separate copies of the file with each set of changes. You can still do that with Docs if you want to, rather than relying on the version history. And then "Tools > Compare documents" lets you merge in all the changes from another document, in an atomic way if you want. And if you want to use the revision history in a "master" version, you can used named versions as well.

Yes, everybody at the firm needs to use Docs. That's not unique to law -- every company that switches from MS365 to Google makes that kind of overnight transition, but it makes sense because you're paying one company or the other, not both.

It's the communication between firms that is going to be stuck in .docx basically forever though, so this is where Google needs to improve its conversion. Ideally Google would also build a "send a copy/transfer" feature so a firm can receive a Google Doc but know that from the moment it "opens" it, a new copy is made on their local Drive so the sending firm never sees edits or activity. But because that feels like it would be too easy to mess up, I think actual .docx file attachments will themselves be immortal, even if both sides used Docs.


>the commit atomicity and comprehensive history aren't supported by Word either, are they? That's just a function of maintaining 20 separate copies of the file with each set of changes.

Sure, you could, but that defeats the purpose of Google Docs which is to make the document collaborative. If you save each iteration in a different Doc, you might as well use Word.

It would also add friction to the workflow because a lawyer would need to download the document from Google Docs whenever they circulate it to a client or counterparty.

The best solution to the problem, in my opinion, is a docx native version control system. I write about how that works in our product Version Story in "On Building Git for Lawyers."

https://theredline.versionstory.com/p/on-building-git-for-la...


> If you save each iteration in a different Doc, you might as well use Word.

Funnily enough, that's how I (and a lot of people I know) use Google Docs.

The version history is great if you accidentally delete something and want to go back, but I don't know anyone who relies on the version history as a kind of meaningful archive -- it's just too fragile. Unless you create named versions, changes get collapsed, and when you make a copy, the version history doesn't get copied.

And it doesn't prevent collaboration -- multiple people can still collaborate on one set of changes in one "branch" file, while other people can collabroate on another set of changes in another "branch" file. When collaboration is done on both, they can get merged into the master file.

You've definitely convinced me that Docs doesn't work for law firms, but mainly for other reasons. Using multiple versions of files doesn't defeat the purpose of Docs -- it still makes collaboration much easier, and nobody's stuck e-mailing files back and forth that are out-of-date by the time they're opened.

Your idea of a VCS for .docx is intriguing though. Good luck!


Thank you!

Sounds like lawyers should be using Git and Markdown! Ha I know...

Docx conversion isn’t great actually. I happened to open a docx with embedded png images and google docs couldn’t display them. If they whiff on a widely used image format like png I imagine there are a lot of shortcomings.

Oof, oh yeah. I mainly deal with text, but I remember there are multiple image formats I think Word supports that Docs doesn't. Also basic vector drawings don't convert. I don't understand how stuff like that hasn't been fixed by now.

When you cryptographically sign something you need to be sure of what you are signing. That means that you have to be able to look at the document and see each and every character that you are agreeing to. So in a future where we actually work out how to do cryptographic signing we will be forced to use something very close to plain text. You need a format that will not unexpectedly change on you sometime in the future. That might break the signature or worse, maliciously change the meaning of the document without breaking the signature.

Opaque blobs like docx are not suitable for applications where the content of the document has to be completely clear to the various competing parties involved in something like a contract. It only works because the document gets printed out and then signed with a pen. If we want to move past that we need something different.


I've been doing all my personal notes etc that I want a rich text format in .ODT for decades now and don't regret it one bit.

I do regret being overly paranoid in my 20s and not writing down my master passphrase to my personal documents -- I lost a huge chunk of diaries and writings due to that.

Fun fact: ODT uses Blowfish encryptio. Remember when we made Bruce Schnierer a meme like Chuck Norris? He wrote it -- apparently it's faster than AES?

Anyways, if you save with password in a .ODT file, if you pick a strong password you've got a nice little self contained encrypted volume that doesn't require "suspicious" software to open.

ANYWAYS, a bit of a tangent but... looking forward to death of Word.


I'm sure ODT works well for many personal use cases, but can guarantee it will never see adoption in the legal industry. Microsoft Word is the only viable option for lawyers.

>I'm sure ODT works well for many personal use cases, but can guarantee it will never see adoption in the legal industry. Microsoft Word is the only viable option for lawyers.

The legal industry also uses MD5 to certify digital evidence hasn't been tampered with, that too will eventually bite them in the ass.


> I'm sure ODT works well for many personal use cases, but can guarantee it will never see adoption in the legal industry. Microsoft Word is the only viable option for lawyers.

I'm a lawyer, though I'm practicing in a wholly different legal system (Romanic civil law) and another country. Why would you say that?

No issues against .docx and and Word per se, but I hate that stupid ribbon with undying hatred. Thus I use LibreOffice as much as I can, while maintaining a licensed Office 365 setup under dual boot with Windows for cases when I have no other choice.


I don't think it's too surprising that another country's legal profession would have a different culture than that of the US. When OP says that ODT will never see adoption in the legal industry, I think it's fair to say there was an implied "in the US" there.

Your post comes off as a little patronizing, and still does not answer the question about the "why".

Ironic claims for me to see, observing the legal profession was the first time I noticed formatting could be irrelevant. LexisNexis on a text browser with dot matrix was just as legally binding as the case references before it.. Similarly the early ToS display.

Anybody else been using Microsoft Word since before Windows when it was Multi-Tool Word?

> For coders, visual aesthetics don’t matter. For lawyers, they are a technical requirement. While this difference may seem arbitrary on the surface, it is downstream of a critical technical difference between the two fields. Machines interpret the work of coders. Human institutions interpret the work of lawyers.

I believe this is not only infuriating, I am pretty sure it is actually illegal. If lawyers would think that visuals are more important than semantics, they would explicitly discriminate blind people.


>If lawyers would think that visuals are more important than semantics

I never claimed that it was more important than semantics. But it is, nonetheless, essential.


The proposals suggesting markdown + git just sound like tone deaf proposals from 'coders' that are trying to push their square peg through a round hole. More appropriate proposals would be LaTeX + git or Word + subversion or some other vcs that has good binary support

Nonetheless, agree with the author that I don't see anything disturbing Word in that space for a long time, as good luck trying to get a middle-aged with minimal tech understanding to learn and use LaTeX over Word.


Version control + binary file support is of limited utility for lawyers. They need to see what's changed from version to version, including formatting changes. This is why we spent years building out the document processing technology needed build version control for Microsoft Word.

I write a lot more about it in an earlier essay, "On Building Git for Lawyers."

https://theredline.versionstory.com/p/on-building-git-for-la...


I'm a lawyer, though most of my writing is related to litigation, not contracts. I did this in my 30s. It worked just fine and made beautiful PDFs. However, judges got pissed, because they have their own software to automatically sign proposed orders. Colleagues got irritated bc they received only pdfs. Eventually I got fed up with it because latex is truly a PITA to draft legal citations and information.

Another example of specific formatting can be seen with the U.S. Government Publishing Office Style Manual, iterating since 1908: https://www.govinfo.gov/collection/gpo-style-manual?path=/GP...

Can you still use the words "women", "gender" and "class"?

What about LaTeX or Typst?

It works but produces pdfs. That becomes a problem. More importantly, you spend FAR more time writing documents using latex than word. The friction is enough to make writing legal stuff with it not worth the pain.

The same arguments of this essay apply to LaTeX and Typst

The first argument actually leans in favor of LaTeX or Typst as a better replacement for Docx.

A LaTeX or Typst document can contain both the content and formatting together within the same file. This isn't idiomatic for either language, and my experience is that this is more common for Typst than LaTeX, but both can do so. All of those formatting rules like small caps, table widths, margins, page numbering, etc.? Those can be rigidly defined in either LaTeX or Typst and are better guarded aginst accidental formatting rules breaches from double click, copy/paste, or table cell insertion than in Word.

I'm more sympathetic to the network effect argument. It's hard to envision a reasonable redline system compatible with both Docx and LaTeX/Typst.


I remember when word hit the streets, Lawyers at the time hated it and they kept Word Perfect alive because of its foot note logic.

I wonder if M/S got that "fixed", early on they had a hard time with it.


Interesting. Strangely, no mention of HTML, whose tables can do colspan. Just don’t add CSS

HTML can represent a docx file in a web application but it can never replace docx. Docx files are a protocol for sharing documents between lawyers. I go into more detail on that in the "Docx is a protocol, not a filetype" portion of the essay.

https://open.substack.com/pub/versionstory/p/on-the-immortal...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: