
Filling in PDF forms with Python (2018) - firstbase
https://yoongkang.com/blog/pdf-forms-with-python/
======
LarryMade2
I've used (in PHP) TCPDF (was FPDF) and FPDi, FPDI extends TCPDF to import the
PDF original (used as a background/template) and then TCPDF can write content
(text, lines, shapes, images, etc.) on top of it.

Using fillable PDFs... A lot of PDFs I encountered didn't have fillable fields
or were those provided not as large as I needed to fill properly, so mapped
the content on top of the PDF ignoring any forms. (might not be possible if
the forms preform some other function than just to populate the form)

Initially I would identify fields and create a form content database/array.

Overlay a grid on the "template PDF" with the TCPDF (point measurement is the
PS/PDF standard) and hand determine the field location coordinates. (Making an
HTML overlay won't cut it you need precise measurements.)

Add in some paging logic for handling multi page data, etc. and have it put it
all together.

...

But PHP isn't one for signatures, so Python would be best...

Looking at resources like I've done in in PHP, you have to use two libraries -
one to import the pages (with PYPDF2) and then to create the content on the
imported page (pyfpdf). Looks like they are exclusive, bso you create content,
then merge pdf template with content PDF.

Someone wrote an example:
[https://gist.github.com/dwayneblew/79da32727358b502f6ec](https://gist.github.com/dwayneblew/79da32727358b502f6ec)

This should get you closer I think.

------
luminadiffusion
I had to solve this problem a few years ago. My solution was as follows:

1\. Convert PDF -> multiple individual SVG files. (I probably used Cairo)

2\. Use Inkscape to set your fields and name them like Django template
variables.

3\. Store these prepared SVGs in the file system of your app.

4\. Call them when needed and fill them in with the Django Template rendering
engine.

4a. Works with including images too if you convert them to base64 encoded PNG,
then insert them into the SVG.

5\. Convert individual SVGs -> individual PDFs. (Cairo)

6\. Merge individual PDFs into a single combined PDF. (Cairo)

7\. Deliver finished merged PDF.

After the initial step of preparing your SVG, which can take a bit of time to
get right, it only takes about 2-3s to produce a fully compiled PDF and gives
you all of the necessary functionality out of them - sans all of these chaotic
intervening libraries.

I can't tell you how many months it took me to figure that out. It was a while
though. When the system was operational, we were sending 10,000 multipage PDFs
per day on a single Django instance on a T2 medium AWS instance.

------
formalsystem
Anyone familiar with the history of how PDFs became such a widespread format
in the first place? I get that it looks nice but not being able to edit it by
default just seems weird to me.

~~~
orev
The “read only” nature of PDFs is a feature, not a bug. The idea being that
once you distribute the PDF, it can’t be changed, and thus has more “truth”
than something editable would. Even now PDF is considered an acceptable format
for legal documents where Word docx is not. Of course this is completely false
safety given that many programs can edit PDFs.

~~~
izacus
> Of course this is completely false safety given that many programs can edit
> PDFs.

Well, unless you sign the PDF. Even more, you can sign each edit separately,
so you can do things like add content and signatures and still verify who
added what. Meaning: one party can create PDF with forms, sign it, then the
party filling out the form can sign their own changes for authentication.

And let's not forget the fact that PDF renders correctly on pretty much any
machine you put it on - this is incredibly important.

------
simon04
In my opinion, one should _always_ try to get changes/fixes/patches applied
upstream. Even if the still need some discussion/tuning. In the long run
everyone benefits. Think of 1000 people maintaining their own fork of the
Linux kernel.

------
mckmk
I had a similar need at a company I worked for. My solution was actually quite
similar to the author's #1 with the major exception being that I used ODG for
LibreOffice Draw which mostly solves the author's two main complaints here.
Background images can be high quality and placing your text is as easy as
clicking where you want to place your text box.

The only other major difference is that I didn't interact with UNO. Since Open
Document Format files are zipped XML files I extracted the content.xml and did
regular expressions for my variables then replaced them.

We did have to do signatures as well but that turns out to be not THAT much of
a pain. If you insert an image on top of your form manually then look at the
resulting file you can pretty much copy the part of the XML that refers to the
inserted image, insert the signature image into the ODG zip and make sure the
names line up and it will work.

It's worth noting that the practice of editing complex XML with regular
expressions is not always advisable. In my experience it works fairly reliably
with ODG because the format remains simple. But, with ODT it can result in
corrupted files quite easily because additional XML can lie inbetween the text
letters of your variables. Then you'll be on a mission to find and ignore all
the text markup like XML bold markers and span tags and paragraph markers and
style tags.... before you know it your simple unzip regex rezip becomes a
whole library.

~~~
phonethrowaway
I've had great success with this library:

[https://github.com/christopher-
ramirez/secretary](https://github.com/christopher-ramirez/secretary)

------
stevenjohns
There is more than one way to deal with this, unfortunately it's all quite
messy. But it is possible.

One is to use Inkscape to layer the image as an OCG, the other is to treat the
image as a "watermark" (which is really just another layer) with your image
via PyPDF2 or similar.

