
Converting untrusted PDFs into trusted ones: The Qubes Way (2013) - kawera
http://blog.invisiblethings.org/2013/02/21/converting-untrusted-pdfs-into-trusted.html
======
legulere
> A somehow better approach is to parse the original PDF, disassemble it into
> pieces, and then reassemble them into a new PDF only using the “trusted”
> pieces

I wish this approach was used more often, as it also easily allows you to
deprecate stuff in your file formats. What you usually have is a huge mess of
code that supports all things that ever existed and often even standards don't
drop cruft in the name of backward compatibility.

The current approach leads to big unmaintainable codebases riddled with
security holes. Font parsers are a good example for this as can be seen in the
google project zero font parsing vulnerability series:
[http://googleprojectzero.blogspot.de/2015/07/one-font-
vulner...](http://googleprojectzero.blogspot.de/2015/07/one-font-
vulnerability-to-rule-them-all.html)

------
mpweiher
"Converting PDFs to bitmaps."

------
jorangreef
Does anyone have experience doing server-side OCR on DOCXs, PDFs etc. safely?

------
whatifitoldyou
OK, in theory one could use an exploit on the PDF, to compromise the sandboxed
converter and create a malicious image. ...in theory.

~~~
matteotom
She explains that: the converter outputs a size and a stream of rgb values,
which are easy to parse and verify, and the worst thing that could happen is
you get a bad output image.

