
Pure: A Static Analysis File Format Checker - jorangreef
https://github.com/ronomon/pure
======
jorangreef
The backstory:

Somewhere between an HN thread on July 4 2019
([https://news.ycombinator.com/item?id=20352439](https://news.ycombinator.com/item?id=20352439))
and David Fifield's generous references to @ronomon/zip on his website and in
his "A better zip bomb" paper, I received an email from Maxim Vainstein, an
R&D Lead at Microsoft and the head of their Product Release and Security team,
asking me to port @ronomon/zip from JavaScript to C, so they could use it as a
static analysis tool to scan all software released by Microsoft.

Microsoft were happy for me to licence the work under the MIT licence and so
@ronomon/pure was released today. I have only a few years experience in C, and
am confident there will be some fairly embarrassing flaws in the code, please
let me know what you think. I am also considering a port to Zig as a safer,
simpler implementation that remains C-ABI compatible for portability and easy
embedding.

At present, Pure does exhaustive file format checks on zip files, but I want
to expand Pure as an open-source static analysis tool for more file formats,
starting with MS-CFBF Office files. This recent paper in VirusBulletin shows
some staggering results for static analysis to detect 90% of zero-day exploits
in Office formats (see the table at the end):
[https://www.virusbulletin.com/uploads/pdf/magazine/2019/VB20...](https://www.virusbulletin.com/uploads/pdf/magazine/2019/VB2019-Shah.pdf)

I think email might prove to be the perfect place where something like Pure
could be put to good use, where a combination of policy (no executables, no
macros) and static analysis on the remaining file formats can narrow the gap
and obviate the need for machine learning or CVE-laden antivirus, protecting
whole groups of users through an opt-in "please defend me from malware email
attachments" mode, without requiring buggy software vendors to improve the
quality of their software. At the same time, it's moving away from Postel's
Law and helping to enforce and uphold open standards and debug software with
fail-fast feedback.

Email is the number one delivery vehicle for malware but most email providers
don't have the open-source tools available to protect their users. My hope is
that independent email providers such as Hey and Fastmail will consider
sponsoring work on new file formats in Pure and come on board to encourage
adoption.

