
Microsoft Has Manually Patched Their Equation Editor Executable - dielel
https://0patch.blogspot.com/2017/11/did-microsoft-just-manually-patch-their.html
======
userbinator
Notice the xchg, stosb and a loop instruction. This was definitely written by
a skilled Asm programmer --- I've never seen even a compiler at -Os generate
code like that.

This also compels me to "code-golf" the function even more:

    
    
         push edi
         mov edi, [esp+8]
         mov ecx, [esp+12]
         jecxz label2
        label1:
         push ecx
         call sub_416352
         stosb
         pop ecx
         test al, al
         loopnz label1
         jecxz label2
         dec edi
         salc
         stosb
        label2:
         pop edi
         ret
    

Original: 58 bytes; patched: 44; mine: 30.

I've done plenty of patching like this, and indeed the relative "sparseness"
of compiler output very often allows the more functional version to be smaller
than the original. It's amazing how many instructions the original wastes ---
notice how _none_ of ebx, esi, or edi are used, yet they get needlessly pushed
and popped; and despite saving those registers so they could be used locally,
the compiler perplexingly decided to keep all the local variables on the stack
instead. The "jump around a jump", with both of them being the "long" form
(for destinations greater than 128 bytes away, not the case here) is equally
horrible. This may actually be a case where today's compilers _will_ generate
smaller code for the same source.

 _Note that in 32-bit code, memcpy is typically implemented by first copying
blocks of 4 bytes using the movsd (move double word) instruction, while any
remaining bytes are then copied using movsb (move byte). This is efficient in
terms of performance, but whoever was patching this noticed that some space
can be freed by only using movsb, and perhaps sacrificing a nanosecond or
two._

On older processors this was true, but since Ivy Bridge a REP MOVSB will
essentially be as fast but smaller. Look up "enhanced REP MOVSB" for more
information.

~~~
azag0
How does this go with the often quoted mantra that you can only beat compilers
today if you're an extremely skilled asm programmer? Or is the problem you
describe just about executable size rather than speed?

~~~
simias
Optimizing for size is easier because you only have exactly one metric to
consider: how many bytes your instructions take.

When optimizing for speed you have to consider many factors like the relative
speed of each instruction, cache behavior (including size of the cachelines,
associativity, number of layers, relative speed of the layers...), pipelining,
branch prediction, prefetching, whether moving your data to SIMD registers
could be worth it, what to inline and what not to inline, what to unroll and
what not to unroll, constraint solving to optimize things that can be computed
or asserted statically etc...

~~~
TazeTSchnitzel
And that's for a single processor! There's myriad CPUs the end user could be
using.

~~~
ReverseCold
x86 and x64 are the only two that someone could be using on desktop, right?

~~~
pjc50
Timing rules can be very different even between different models of the same
processor, let alone between different ranges (i3 vs i7) or generations
(Skylake etc). An example:
[https://gmplib.org/~tege/x86-timing.pdf](https://gmplib.org/~tege/x86-timing.pdf)

~~~
Narishma
I don't think there are any differences between an i3 and i7 of the same
generation in terms of instruction timings.

~~~
zacmps
True, caches are a whole different story though.

------
infinity0
Binary hacking FTW.

The semi-official Debian server, alioth.debian.org, where a lot of random
developer stuff is hosted, is stuck on Debian wheezy for various reasons. Most
users, including myself (a Debian Developer) don't have root access to upgrade
the server nor install new software.

The version of libapt-inst is too old to support Debian packages with
control.tar.xz members (only control.tar.gz members). So we can't upload newer
Debian packages to various custom APT repos that we host on that server.

I worked around this by looking at the libapt-inst source code, figuring out
how to make it support control.tar.xz instead of control.tar.gz, and binary-
patched libapt-inst.so to have this effect instead. It's actually fairly
simple

1\. there is a check for control.tar.gz, the failure branch prints an error
and then returns. I overwrite this with NOP so it goes into the "success"
branch.

2\. then later it extracts the control.tar.gz member and pipes it through
gzip. Luckily, nowhere else in the program uses the exact string
"control.tar.gz" or "gzip" so I simply patch that string "control.tar.gz" ->
"control.tar.xz" in the binary and also change "gzip" -> "xz\0\0".

(Actually given the change in (2), (1) is not necessary. But without it you
get a bunch of spurious error messages.)

Applying this patch makes the resulting .so lose the ability of working with
old control.tar.gz members (which is still needed of course). So my workaround
does this:

LD_PRELOAD=libapt-inst.so.patched apt-ftparchive [..] && apt-ftparchive [..]

i.e. runs it once with the hack to pick up the new-style debs, and once again
without the hack to pick up the old-style debs.

My motto is, "dirty solutions for dirty problems". :D :D :D

~~~
mschuster91
> The semi-official Debian server, alioth.debian.org, where a lot of random
> developer stuff is hosted, is stuck on Debian wheezy for various reasons

Jeez. How is security maintained? That actually scares me a bit.

~~~
mort96
Wheezy is supported until the end of May 2018, so it still gets security
patches.

------
rogerhoward
I'm surprised no one has noted the copyright is to Design Science - this is a
small company in my hometown who are still around. I've spoken with their CEO
a few times and I wouldn't be at all surprised if the source code was lost, or
somehow at least wasn't being made available to Microsoft (I doubt it ever
was). It's a really old school shop who seems to have largely been coasting on
the licensing of this one component for the past couple decades and I wouldn't
at all be shocked to find they no longer are capable of maintaining it
themselves.

~~~
rob74
I noted it - thanks for the background info on the company! I also assume that
either they are not able to maintain the software themselves, or they have
lost the source code, but it might also be that setting up the toolchain to
compile such an old piece of software is more effort than just patching the
binary.

~~~
sjburt
This is such an underappreciated aspect of code stewardship. There are
powerful tools for source control and archiving. But ensuring that state of
code could actually be built at an arbitrary date in the future is so much
less assured.

------
dzdt
I once worked at a place which lost part of the source code for their giant
mission-defining application. They spent a decade linking in object code for
which there was no corresponding source code.

The build team was very proud when they announced that the application would
finally start being built from the source code in version control.

Stuff happens!

~~~
bartread
Stuff happens, indeed, and more often than most of us realise.

Getting on for a decade ago now I was working at Red Gate when they bought
.NET Reflector - a decompiler for .NET code - from Lutz Roeder. After the
acquisition we started asking people what they were using it for.

Turns out a significant minority of them were trying to recover lost source
code, or source code they never had in the first place (e.g., where a supplier
went out of business). I don't remember the exact figure but it might even
have run into low double-digit percentages. Bear in mind this is a tool that
was being downloaded tens of thousands of times every month by all manner of
people working for all kinds of organisations of every size and you can see
the scale of the problem.

There were a couple of Reflector add-ins that would allow you to take a .NET
binary and generate a C# or VB.NET Visual Studio project with all source code
from it. The source code was never perfect and wouldn't likely compile first
time, but it was certainly better than starting from scratch. Not surprisingly
these add-ins were among the most popular.

Granted, times have changed, and I think source control is probably the
default for almost everyone these days - although I would have expected that
even in 2008 - but, bottom line: I think this sort of thing happens a _lot_ ,
for one reason or another.

~~~
giancarlostoro
Heh can say I know some guys who have done this, and I myself have done this
(and with similar but open source tools) there's also the "what is this
sketchy .NET app really doing" moment where you want to know it's not doing
anything "funny" to your system and you peek at the code.

------
nicktelford
There's only two reasons I can think of why they'd patch the binary directly:
either they've lost the source-code, or they no longer have an environment
they can build it in.

~~~
leoc
For ages Alan Kay has been claiming to know that MS has lost part of the Word
codebase.

~~~
nimish
Microsoft has lost tons of code over the years. Even with the source,
refactoring office which has people using file formats that are binary dumps
of memory, is not trivial.

~~~
leeter
This was actually the Onus for the switch to the XML formats IIRC. Basically
after the DOJ settlement they had the uneasy realization that the requirement
to document the .doc format was going to be a nightmare because nobody had a
complete spec of it. To make it worse the code wasn't very portable and
customers were asking for x64 support pretty hard at the time.

~~~
rangibaby
Re: 64 bit is that about Excel? I found the idea of a Word document needing
>4GB RAM somewhat absurd

~~~
acdha
Note that due to the way the 32-bit address space was laid out, a process
could really use only about 2GB of RAM — the extra 2GB was reserved for the
system based on old 386-era CPU limitations. Since that space had various
things allocated in it, a normal program generally wouldn't get more than
1.7GB in a single allocation — e.g. according to
[https://support.microsoft.com/en-us/help/313275/-not-
enough-...](https://support.microsoft.com/en-us/help/313275/-not-enough-
memory-error-messages-when-you-copy-formulas-over-large-ar) Excel 2003 had a
heap limit of about 1GB.

The other thing to keep in mind is that Office documents are, in addition to
being somewhat bloated internally, far more than just text. Think about all of
the people writing things like documentation with hundreds of screenshots (all
uncompressed BMPs), audio or video, etc. — and the people with those are far
more likely to be at the kind of large corporations which expect value from
their support contracts. There were multiple ways to deal with that problem
but in general the easiest was switching to a 64-bit address space.

~~~
yuhong
Recently they had to enable /LARGEADDRESSAWARE in 32-bit Outlook.

------
dawnbreez
While this _does_ suggest that they lost the source code for this program, it
also shows an unbelievable amount of skill.

~~~
porfirium
HN, where writing assembly shows an unbelievable amount of skill.

~~~
dawnbreez
Not just writing assembly, rewriting a compiled object file without letting
any of the addresses change, without having the source to work with, and
presumably with almost no documentation, to patch a program that has been left
untouched for almost 20 years.

~~~
bitexploder
I kind of agree with the sentiment. It isn't that crazy.

We do this as a matter of course all the time. Patching a small handful of
instructions is pretty easy. You could learn to do it on a week or less if you
are a decent programmer.

Do it well? Do it quickly? Do it idiomatically and in a short amount of
time.... Takes real skill.

~~~
stevekemp
I used to patch games for infinite-lives, or to allow my serial numbers to be
accepted. Doing this wasn't hard, as somebody who grew up writing assembly
language on 8-bit machines in the 80s.

One fun self-challenge was always to make my modifications as small as
possible. e.g. one-byte changes were a lot more impressive than two-byte
changes.

~~~
bitexploder
Your own personal game genie.

It's interesting. I have observed if people learn on an 8 or 16 bit machine,
like in Microcorruption, they tend to be able to pick up more complex ISAs
much easier. It helps to know the first principles.

------
lzybkr
I have no specific insight to this patch, but I do have personal experience
binary patching a popular Microsoft product.

My patch was to the VC++ compiler nearly 20 years ago. We had source, and my
fix was also applied to the source (which I'd imagine is still there today),
but a binary patch also made sense in the short term.

The binary that I patched was used to build another important Microsoft
product, and this bug was found late in the product cycle where any compiler
change was risky.

We weren't 100% confident we had the _exact_ sources used to build that
version of the compiler (git would have been handy then), we only knew, plus
or minus one day, what the sources where.

After carefully evaluating the binary patch versus the risk of building from
uncertain source, the binary patch was taken to reduce risk.

I'm no reverse engineer, but this was a pretty interesting exercise in RE even
though I had sources. I had no symbols, and the binary was optimized so that
functions were not contiguous, cold paths were moved to the end of the binary.
Just finding the code I needed to patch was not easy.

The code review was fun - a dozen or so compiler engineers reviewed the change
on paper printouts - the most thorough review I've had in my career, and the
only one that used paper.

To the best of my knowledge, this binary was never used to build anything
other than that specific version of the product which I won't name - not that
it matters really, the product is still in use, but that version is unlikely
to be in use anywhere anymore.

~~~
dielel
Thanks for sharing this. I suspected that "not being sure if you have the
exactly right source code" could be a real world reason to patch a binary, and
now I know.

------
dtech
That is both pretty impressive and horrific.

I wonder if they patched this way because they wanted to maintain as much
binary compatibility as possible, or if they don't have the original
source/couldn't reproduce the build process.

~~~
gizmo
Horrific? This is what you do when you want to make sure you don't introduce
any unintentional changes. Computers aren't magic, and there is nothing
_wrong_ about patching a binary.

Compiling the software with a modern compiler or linking to a modern runtime
is very likely to bring obscure bugs in the codebase to the surface. It's
pretty hard to replicate the entire build process that produced the original
binary, even if they have the source code and everything else on hand.

~~~
dtech
> Horrific? This is what you do when you want to make sure you don't introduce
> any unintentional changes.

Horrific, because the average programmer would consider patching the binary a
worst-case scenario.

> there is nothing wrong about patching a binary

I would only trust a skilled assembly programmer to do this task without
creating other problems, and most businesses don't have those on retainer.

~~~
criddell
> Horrific, because the average programmer would consider patching the binary
> a worst-case scenario.

That says more about the average programmer than it does about the
reasonableness of binary patches.

I used to table in MASM on the 8088 and have dabbled a little on
microcontrollers (6800) as well. But the last time I looked at modern x86, I
was pretty lost.

~~~
Coincoin
Admittedly, x86 assembler is a total clustefuck.

------
be5invis
You have to know that the MSFT may not have the source code of Equation
Editor, since it is a simplified version of MathType.

------
Someone
This ‘old’ equation editor is a limited version of MathType
([https://en.wikipedia.org/wiki/MathType#Microsoft_Equation_Ed...](https://en.wikipedia.org/wiki/MathType#Microsoft_Equation_Editor))
that has been supplanted by a built-in equation editor.

Chances are that Microsoft doesn’t have a license for bug fixes from Design
Science (makers of MathType) anymore and isn’t willing to pay for this fix.

Alternatively, Design Science may not be able to deliver a version that, for
maximum backwards compatibility, has only this fix (to minimize risks, they
would have to have kept an environment around that hosts the compiler used
back then)

------
dmitriid
One reason for doing it this way is possibly this:

> Well, have you ever met a C/C++ compiler that would put all functions in a
> 500+ KB executable on exactly the same address in the module after
> rebuilding a modified source code, especially when these modifications
> changed the amount of code in several functions?

It's quite possible they are still contractually obligated to maintain some
pretty old systems where changes to the .exe would produce unexpected
behaviour. I had Access apps/databases crash on a system if they were built by
a different version of Access.

------
magnat
Slightly off-topic: what program is used to produce disassembly graphs as
those in article?

~~~
kristofferR
[https://www.hex-rays.com/products/ida/](https://www.hex-
rays.com/products/ida/)

IDA is widely regarded as the best disassembler and debugger out there. It
comes with a price to match too though.

~~~
pjc50
It is something of a rite of passage in the piracy community to crack your own
copy of IDA Pro.

It's also a rite of passage to distribute cracked and boobytrapped copies on
filesharing sites...

~~~
londons_explore
Oh, I remember the old "this will only work if your timezone is set to Moscow"
version...

------
_pmf_
Ah, this brings up a lot of font memories of me in high school preparing
presentations using this fine piece of software[0] before replacing it with a
1GB open source equation editor called LaTeX.

[0] It was actually quite usable once you got to know its warts.

------
yoz-y
The article mentions that the timestamp of compilation gets embedded into the
binary. When does this happen? I am used to having identical binaries when
recompiling the same source code with same flags (and compiler and so on and
so on)

~~~
svenfaw
What compiler do you use? Almost all of them embed a compilation timestamp
(which is one of the reasons reproducible builds are often a challenge).

~~~
yoz-y
Mainly visual compiler and clang.

------
lunixbochs
Binary patching is a really common requirement in attack/defense CTF, and
there are a few projects floating around to help with it.

Keypatch helps you do assembly overwrites in IDA Pro.

Binary Ninja lets you do assembly (and C shellcode!) overwrite patches, and
even has undo.

I have my own project [1] for patching ELFs that relies on injecting
additional segments and injecting a hook at any address, so as to not require
in-place patches. It can also massage GCC/Clang output and inject _that_
reliably into an existing binary.

[1]
[https://github.com/lunixbochs/patchkit](https://github.com/lunixbochs/patchkit)

I have my own story about this as well. A few years ago I released a port of
Uplink: Hacker Elite for the OpenPandora handheld with a few game engine
patches, and some people were running into a bug: the game would enter the
"new game" screen on every launch, even if you already had a save game to
load.

I and couldn't find the exact source I'd used to build it and didn't want to
spend time making sure I got all of my bugfixes into the vanilla repository,
so... I went digging with IDA, found the topmost branch to the "new game"
wizard, and patched the address to go to the main menu function instead. At
that point you could still click "new game" from the menu and it wouldn't go
through the patched address (so "new game" still worked), but you could also
load an existing game, thus fixing the bug!

I still have nothing on Notaz, who statically recompiled StarCraft and Diablo
for that community :)

------
alexeiz
It's an old program the source code for which may either not compile with the
modern C++ compiler, or be lost. Back in 2000, Microsoft was using Visual
Source Safe for managing its source code. I wouldn't be surprised if nobody
can remember where the heck the VSS repository with that source code is
located.

That leaves the binary monkey-patching as the only reasonable solution. I'm
pretty sure Raymond Chen still works at Microsoft...

~~~
ajross
Binary patching is really only reasonable when the source code is indeed lost.
If they had the code but simply needed a compiler that worked, they could have
rebuilt it using the same toolchain and build environment it was built with to
begin with. Old versions of Windows and MSVC are obviously still around.

------
jws
Just a historical note: Patching used to be much more common. Back in the Vax
VMS days the image file format (executables, not pictures) had a section for
patches.

From the ANALYZE/IMAGE command…

 _Patch information --- Indicates whether the image has been patched (changed
without having been recompiled or reassembled and relinked). If a patch is
present, the actual patch code can be displayed. (VAX and Alpha only.)_

------
lima
And here I am, manually patching Docker containers...

~~~
atupis
Just why?

~~~
dullgiulio
Because they are immutable /s

------
foobarbecue
I suppose the fact that they have patched the binary means they can never
again patch the source?

~~~
artursapek
They'd have to re-implement the patch in source before doing anything else to
it. I wonder if they are no longer able to build from source anymore... why
else would they resort to this?

~~~
foobarbecue
As explained in the article and in other comments, it's possible that there
are dependencies that rely on address continuity of contents or file size
continuity.

------
anon1253
They probably just lost the ability to build it, or the source code can't be
found. Happens quite often. 17 years is a /long/ time to maintain build
systems and remember where you put the files.

~~~
nathan_f77
I'm glad that most of the software development community seems to have settled
on git. I get the feeling that I'll still have all of the source code for my
projects in 20 years.

Redundant backups are especially important for software companies. It's scary
to think how many startups give all cofounders and developers admin access to
everything. It helps that git is distributed, but it's not hard to imagine a
scenario where a ticked off former employee wipes everyone's laptops and
deletes the hosted source code.

Even if you don't update the mirrors regularly, it's good to know that you
have some copies of data in BitBucket/GitLab/Heroku/Google Drive.

~~~
Merad
I don't know if I would hold your breath. 10 years ago I think most people
hadn't even heard of git (it was ~2 years old) and Google Code was the hot new
thing, and GitHub was a year or so away from creation. At the time most people
seemed to be pretty content Subversion and hosting on Sourceforge (before it
turned evil) or Google Code, but in the next ~5 years everything changed.
Granted git, GitHub, etc. have far more momentum that anything that came
before, but this is a field where it feels like the only constant is change.

------
alkonaut
If the thing is 17 years old an a replacement has existed since forever, what
purpose does this file have today (Assuming I'm on a modern windows, I run
either no ms office or a modern office version).

~~~
whatthesmack
From the article:

> While Office has had a new Equation Editor integrated since at least version
> 2007, Microsoft can't simply remove EQNEDT32.EXE (the old Equation Editor)
> from Office as there are probably tons of old documents out there containing
> equations in this old format, which would then become un-editable.

~~~
alkonaut
Ah. missed that. But obviously I'd be very happy for this program to be
patched by replacing it with this program:

    
    
      MessageBox.Show("This document contains an old equation and you don't have the editor. Do you want to download the old editor?");
    

Becuase there comes a point in time when any time you bump into an equation
like this, it's actually more likely to be a malicious one.

Even better if they could at least render the old equation statically using
the new office, but not edit it. Then it would be almost insanely rare that
anyone needs the old editor.

~~~
twoodfin
This is the kind of thing that can rapidly escalate to a CTO asking his
Microsoft sales VP why he's spending $18M/year on upgrade and support
contracts when a report that's worked "forever" can start talking back like
Clippy.

Microsoft doesn't preserve backward compatibility because they're stubborn;
it's a key part of their value proposition to some of their biggest clients.

~~~
alkonaut
This is the thing: I'm also a paying customer, I just don't pay as much. But
I'd like to pay for more security/less compatibility, instead of the other way
around.

This should also be very easy to do e.g. by noticing whether you are in a
setting where there is any risk of the scenario you say. If it's a home
machine for example, then don't worry about compatibility, focus on security.

~~~
WorldMaker
We've seen where that leads to: there is plenty of software out there where
you'd have to open every document you wrote with version N - 1 in version N to
"convert it" to N's format, and version N + 1 can't read N - 1 files at all.

That can lead to a very ugly form of bitrot quickly. Do you convert every
document you've ever touched every time, even ones you haven't needed in
years, just in case? Do you worry that every time you convert a file it might
corrupt the file in the process? Do you find some way to keep every version of
the program available at all times and play try every version until it opens
the file?

Backwards compatibility in general offers much greater means for archival.

In this specific example: losing backwards compatibility for ancient equations
directly threatens the archival of math and science documents. That seems like
it could have huge repercussions in some fields.

~~~
londons_explore
If I were designing the software, there would be a module which can upgrade a
file format from version N to N+1.

That code is written in some kind of sandboxed VM/bytecode. You freeze the
bytecode when you release a version.

When you release version 20 of the app, it has bytecode to convert 1 -> 2,
2->3, 3->4, etc, all the way to version 20. When it finds an old file, it runs
all the updates as necessary.

If there is a bug in the updaters, it stays there forever.

If there is a security problem with the updaters, thats whats the sandbox is
for.

------
sswaner
Reminds me of that time Mark Watney used a similar method to patch his rover’s
comms to connect to an old radio system.

------
abainbridge
I wonder how the checked in the fix to the source control system?

~~~
misterdata
'FCIB', apparently:
[https://blogs.msdn.microsoft.com/oldnewthing/20171114-00/?p=...](https://blogs.msdn.microsoft.com/oldnewthing/20171114-00/?p=97396)

~~~
Freak_NL
> F-C-I-B or as a sort-of acronym eff-sib […] stands for "foreign checked-in
> binary" […] The term FCIB didn't originally mean "foreign checked-in
> binary". According to legend […] "Not another f—ing checked-in binary!"

------
tzahola
There are plenty of companies hooking into private APIs within Word and Excel
with their “productivity tools”. Probably an important MSFT customer was using
one of these tools as a crucial part of their operations, so they convinced
them not to break it. Just like how Google had to put special cases in Android
to keep compatibility with some hacks Facebook was using in their app.

------
dingo_bat
I salute and respect the guy who did this while hoping I never have to do
anything like this.

~~~
wruza
I was dreaming I’ll _have_ to do something like that until it faded below the
weight of modern <script src> programming. It is like dancing twist, rock and
hardbass in the era of electronic arse shaking.

------
yuhong
I noticed a 0F 1F NOP, which breaks older processors. This is in an update
that goes back to Office 2007.

------
porfirium
So Microsoft lost the source code? Or maybe the engineer couldn't be bothered
to set up the old toolchain used to build this executable?

~~~
moonbug22
Probably bought it in from a third party. Why'd you think Pinball went away?

~~~
tjalfi
Raymond Chen answered this question five years ago[0].

[0]
[https://blogs.msdn.microsoft.com/oldnewthing/20121218-00/?p=...](https://blogs.msdn.microsoft.com/oldnewthing/20121218-00/?p=5803)

