> Edit pdf like a word doc while preserving structure and format.
While I'm sure there are cases where this works pretty well and can be very useful, it may be worth noting that -- just like every other tool in this space -- there will be many PDFs where it simply can't work. It'll be hugely dependent on exactly how the PDF-generating application/tool went about things.
One simple example: suppose your PDF uses a specific font (not one of the standards like Times or Helvetica), so the PDF-generating tool embedded the font. (This is common.) Further, suppose the generating tool embedded a re-encoded subset of the font, including only the glyphs that were actually required. (This is also common.)
Now, suppose the edit you wish to make involves adding a character that was not present in the original document -- let's say you want to change the date from "May" to "June". But the original document contained no occurrences of capital J (in this particular font/style), and so the capital J glyph is not present in the embedded font. No "PDF text editor" can get around this; the best you can hope for is a "J" in some fallback (such as Times) that may look terrible alongside the intended custom font.
And as for edits that would require reflowing multiple lines of text, maybe inserting a new paragraph in the middle of a page, etc.... not much chance of this working out well.
Yes, a tool like this can (in many cases) make it possible to make minor changes (perhaps fixing a typo or updating a word here and there). To suggest that it can "edit pdf like a word doc" seems patently false to me.
One of the weirdest things I've seen is a PDF where the text is complete gibberish if you copy paste, but perfect if you export it to HTML or Word in Acrobat. Never figured out how or why that might happen.
I read somewhere (likely here) that this oddness comes from the idea that PDF is a way of structuring documents for print first, and presentation in a user interface is secondary.
That the rendering of the document on screen is paramount as opposed to ability to manipulate the text itself. "These characters should be displayed at this position in the document precisely"
It would make sense that exporting the document as HTML or Word would make this easier - as these document formats have different goals.
Finding a font that matches the look of letters in the vicinity of the edit on Google Fonts or similar and embedding the missing characters from that font seems doable and should work pretty well.
Figuring out more global properties like multi-column layouts and reflowing text is a much harder problem.
Would be neat to do font recognition against available web fonts to try go find a match, then convert the web font to truetype or whatever format the pdf used, and re-embed the front.
Congratulations for the launch. Looks nice. For those that are cautious uploading sensitive PDFs, you can always just open them with Inkscape, and start editing pretty much anything in a document.
PS: I was once forwarded an application for someone who was supposed to replace me, and I had to interview them. The expected salary was hidden by placing a gray rectangle over it. I removed it using inkscape and saw the expected salary was 30% higher than what I made.
Inkscape to Illustrator is what GIMP is to Photoshop.
Glade someone brought it. For most common tasks is there really a point even using Photoshop and Illustrator now days? Especially with the cloud direction they are moving towards.
I've been incredibly happy with the Affinity Suite. Smoother and cleaner than GIMP/Inkscape/Scribus, and a desktop app for $50 ($25 during Covid), which makes it more palatable to me than the Adobe options.
because GIMP sucks, maybe, and also because most people are required to interoperate with the rest of the profession. i am very disappointed that photoshop et al cannot be run under WINE :(
I have done/still do complex composition and editing in both GIMP and Photoshop, and there is really no missing features in GIMP. If you go into using GIMP expecting it to be PShop, you will always be disappointed, but in reality that is only a good thing.
For sure. If one (and myself included) use the to nicety of Photoshop Illustrator GIMP's user interface is left much to be desired. However, from a open platform perspective, do I really want many hours of editing work to be tight to a cloud platform to a proprietary format?
Think not just about now or one year from now, but 10 years, 20, 50 years from now, 100 years? Open format and open source software is much desired for creation based software. There is the paradox of the making money for the open source software, and I don't have an answer for that. Seems like some kind of sliding scale payment will be needed for these software. May be also some kind of sliding scale payment for eventual money earned form the software up to a limit.
I'm curious how that interviewing situation worked out for you. Did you use it to negotiate a higher salary there or at your next employer? No pressure if you don't feel like sharing that, of course.
Once you put an employee in a position where the only way to get a salary hike is to change employers, there is no way back - you change employers. It worked out well for me, and AFAIK the role is not filled yet in the previous company.
I tried using this with an invoice that I'd created using invoice-generator.com, in the hopes that it would be an easier way to make new invoices. When I tried to replace the To party's name, the text came back partly bold and partly not. There was also a weird overlay on an email address on the bottom that said something about email address protected.
Would love to have a tool like this that worked for making new invoices, among other things!
> When I tried to replace the To party's name, the text came back partly bold and partly not.
Most likely, the PDF used a subsetted embedded bold font, so it only worked for letters that happened to be present in the original text; any new letters were missing from the font and you got a fallback.
Just one of many reasons why a tool like this is the wrong way to approach pretty much any document creation/editing task, because PDF is the wrong document format to use for these purposes.
> Most likely, the PDF used a subsetted embedded bold font, so it only worked for letters that happened to be present in the original text; any new letters were missing from the font and you got a fallback.
Wow, that seems to be exactly right, based on which letters were bold and which weren't. Very interesting!
To the folks downthread who asked about the legality of modifying invoices, I was trying to use one invoice as a template for a second invoice. Most things are the same, so it would have been great if I could have made a simple change!
I don't think laws are ever that prescriptive about what you have to do. As far as I know, the only requirement is when the tax inspector arrives, you need to be able to produce a complete and accurate list of all invoices you have ever sent to customers.
at least for Germany: digital invoices shall be processed in a way that manipulation can be ruled out (GoBD). Hence companies use special scanner and document management systems which document the Revisionssicherheit.
And that is the new, more relaxed situation. At first, digital invoices were required to be signed with a qualified signature, i.e. spend a few hundred quid a year on certificates by a few select CAs (only the usual german "suspects" could qualify due to intentionally onerous requirements).
Even before issuance? No such requirement in Finland at least, so clearly not an EU level requirement.
The Finnish VAT Act does, however, require that companies sending and receiving invoices ensure that the mandatory invoice data is not modified after issuance (209 g §).
A few notes. It doesn't seem to work with documents that have multiple columns. Pushing text over just overwrites the other column. It doesn't seem to reflow text where the source document obviously had margins possibly because that information is looooong gone. Hitting enter to move text to the next line didn't move other text it just again seemed to overwrite it.
Yeah. Switch array & args disable the shell. I hope they’re not running that locally as the download script suggests. But then you still have a bunch of other security issues. Shrug.
I'm investigating the same. The upload endpoint uses secure_filename to get the filename used in that func. I'm not familiar with it, but the docs say it could return an empty string.
and it appears as if it would probably run anything you put between ; and # (in this case it will echo hi). Unless the filename is sanitized, which it appears to not be.
I used this to delete a few pages from a pdf file. However, I can't download the edited version. The "save and download" option is just resulting in opening another blank webpage.
I really wish the PDF layout was easier to parse. No matter which library you use, you always run into edge cases which make text selection and extraction an issue on certain files. I was recently extracting financial data from a bank which provides only PDFs and every time they changed the format just a little bit I had to change large parts of my code to extract the transactions I wanted.
I agree to this, it's the same with insurance companies too when resolving claims. Feels like they certainly want to make the extraction look complicated for an unknown reason. Not often and not all companies but edge cases
Now I just wish there were a version that could run locally. All the Linux PDF viewers suck -- they can't even save a fill-in PDF form or insert a signature.
I spent an afternoon trying a bunch of them some years ago. I settled on the freeware Master PDF Editor version 4 (version 5 inserts a watermark unless you buy a license). https://code-industry.net/masterpdfeditor/
It is super lightweight and opens any complex PDF. Can insert signatures, can edit anything in the PDF (without changing the font even if the font is embedded in the PDF—as long as all the glyphs you are typing are present in the font), etc. My only complaint is that it won't edit an encrypted PDF, but a one-liner Ghostscript command can remove the encryption automatically: https://gist.github.com/compleatang/6046249
There are actually a plethora of choices that are great for what most people are doing namely viewing documents. Forms and annotations are also supported by a number of choices.
Have you tried Okular for example? It supports filling and saving forms and annotations. You can draw a signature with the freehand annotation. This works great with a touchscreen not so great with a mouse.
It also supports a plethora of useful features and looks and works great out of the box.
It actually CAN insert a signature via its Stamp annotation but unfortunately this doesn't produce an annotation that works outside of okular due to a current limitation in poppler which is why all pdf readers based on it won't have that feature.
Here is the 10 year old bug that nobody is working on that would presumably make it possible for any reader to have this feature.
I think it seems like a major missing feature for you but perhaps hasn't received much attention because if you use it to sign business documents you yourself may use this future a lot whereas 99% of users who are consuming documents or exporting them from word processing documents don't know this feature exists nor need it.
Maybe people ought to put money towards a bounty for someone to implement it?
The best editor I’ve found is Qoppa PDF Studio. Although not free it is cross-platform (written in Java) and their licenses are perpetual. They do have a free viewer app which can fill interactive forms and this works on Windows, Mac and Linux: Qoppa PDF Studio Viewer.
> Edit pdf like a word doc while preserving structure and format.
While I'm sure there are cases where this works pretty well and can be very useful, it may be worth noting that -- just like every other tool in this space -- there will be many PDFs where it simply can't work. It'll be hugely dependent on exactly how the PDF-generating application/tool went about things.
One simple example: suppose your PDF uses a specific font (not one of the standards like Times or Helvetica), so the PDF-generating tool embedded the font. (This is common.) Further, suppose the generating tool embedded a re-encoded subset of the font, including only the glyphs that were actually required. (This is also common.)
Now, suppose the edit you wish to make involves adding a character that was not present in the original document -- let's say you want to change the date from "May" to "June". But the original document contained no occurrences of capital J (in this particular font/style), and so the capital J glyph is not present in the embedded font. No "PDF text editor" can get around this; the best you can hope for is a "J" in some fallback (such as Times) that may look terrible alongside the intended custom font.
And as for edits that would require reflowing multiple lines of text, maybe inserting a new paragraph in the middle of a page, etc.... not much chance of this working out well.
Yes, a tool like this can (in many cases) make it possible to make minor changes (perhaps fixing a typo or updating a word here and there). To suggest that it can "edit pdf like a word doc" seems patently false to me.