So in theory you could just give a few keywords and get the full XML. Currently this is on hold, but if someone has a use case and wants to contribute, I'd be happy to continue working on it. I also provide this other library that extracts some essential data from PDFs. The plan is to use them together to automatically build the XML from just the PDF.
I did a similar approach of using XML Stylesheets (XSL) to render the standard invoice as HTML when opened, this also looked nice.
Hopefully OP doesn't learn about how his medical history is possibly passed around.
Meanwhile I write the documentation of my xml configuration files in xsd and convert them to something readable using xslt. One set of data for processing and display everywhere and one less headache about duplicated and badly maintained data.
1. You generate invoices in your ERP and have all the info. To make your client's life easier, you embed the info as XML, so he doesn't need to type it from the PDF.
2. You get invoices without XML and use e.g. invoice2data to extract key fields and then add them as XML for later.
The official documentation also seems to recognize those as a security risk :)
Adobe actually offers "features" like readership tracking in PDFs as part of their commercial offerings.
I’ve been told the PDF spec has some low level functionally to support a MS-DOS emulator. Don’t know how true that is.
I just implemented bulk PDF import this weekend.
It uses pdf.js for the rendering of the PDFs and extracting the metadata (including the fields discussed in this article)..
I have to manage a ton of PDFs for my work / research. Mostly textbooks and compsci whitepapers) and and before working on Polar I was really struggling to manage all the data.
(Disclaimer: I am not associated with the company)
>full network access
>run at startup
>prevent device from sleeping
Prime offender: Amazon Music. Latest rev AFAICT has no "Quit". Have to resort to Settings->Apps->Force_Stop to get its clutter off the screen.
Disclaimer: I've used the company's SDK fairly extensively in my own product (but I'm not otherwise affiliated with them).
I have found that despite the crude nature of it there are things you can find in documents that other tools gloss over.
Genuine thanks for the link and the nudge - I have to go forensic on a few PDFs and I had forgot what the tool was as it has been a while.
Luckily the PDFs I need to edit are old and there should be no problem doing apt-get install rather than compiling a tarball.
Or '87, by '97 I was using Gimp already.
But you can treat a PDF very much like a big zip with some special purpose features. If you want to.
Pdf are a vast and not-so-clean world...