Hacker News new | past | comments | ask | show | jobs | submit login
Stirling-PDF: local web application to perform various operations on PDFs (github.com/frooodle)
507 points by alexzeitler 11 months ago | hide | past | favorite | 231 comments



But... why? WHY??

Why would I run a docker container, a webserver, start a browser, navigate webpages... just to do some operations on a pdf locally?

A few KiloBytes native program like PDFtk (https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/) does the job perfectly.

I don't understand what is the point of bloating softwares like this. Not even speaking of the very bad consequences for the planet.


"does the job perfectly"

Um. I work with PDFs a LOT and.. nah PDFtk is pretty weak. It doesn't even do OCR!

This looks like a wonderful tool that solves a lot of problems that existing tools don't typically put together. It's an achievement, and your post is unnecessarily crass.

This is very much the sort of tool that you host internally at a newsroom so journalists don't have to wrangle with software or write code. Like, in that situation, who cares if it's on top of docker? The users definitely won't give a shit.

Please consider that you're not the target audience....


It may not be right for you, but I can see situations where this would be preferable.

If because of your job you find yourself doing these operations very often, and the ability to do them from several devices with different OSes is valuable, it might be great to throw this on a server.

Or if your work has several people, may be not very technical folks, do them often. I’ve worked in a couple of places where this could’ve come in handy.

Also, it includes an API. Also, being open source, if in the future you’re creating a web app that needs some of these features, you could learn from/copy from its code.

I think this is a good contribution to the world.

> I don't understand what is the point of bloating softwares like this. Not even speaking of the very bad consequences for the planet.

I partially share this concern. I wouldn’t deploy this for myself unless I had a very easy way to stop it and start it, but anyway while not in use it it should only be consuming a bit of RAM, and there’s plenty of very efficient hardware suitable for small servers these days.


A web app makes it cross platform. If you have a homelab, deploy it only once for every client.

And PDFtk doesn’t do annotations afaik which is a huge pain point on Linux (at least for me) because there are no applications that I know of to easily do things that are trivial on OSX like adding text or hand drawn signatures to PDFs. Masterpdf can do it but with a watermark and some limitations.

Maybe it doesn’t suit your particular use case but I wouldn’t say pdftk can replace this project.


Xournal++ was already mentionned, Okular also has annotations and I think adding hand drawn signatures.

Though I welcome (new) work in this area.


Xournal++ is amazing for annotating PDFs. I use it to file taxes where fillable PDFs and e-filing don't work as expected.


Firefox now has some simple built-in PDF editing tools. Text and images can be added on top, but existing text can't be modified.


> no applications that I know of to easily do things that are trivial on OSX like adding text or hand drawn signatures to PDFs

Try xournal++


I think edge would be perfect for you


Edge the browser?


Yes, its pdf editor is very good.


Woah. Very cool. Is this something specific to Microsoft Edge, or also available in other Chromium variants, including Chrome?


Because it's a simple (for me as the user) and reliable way to get a UI without having to send my personal files to some server somewhere. What's the big deal?


The point of the bloat is that web is by far the easiest way to create a cross platform UI. It’s far from ideal but that’s the world we live in.


No, there are easier and more lightweight ways to create a cross-platform UI than to write HTML/CSS/JS: Tkinter, wxPython, PySide6, MiniGUI, Dear ImGui, Nuklear, React Native.


Having used most of those at lrast trivially. i'd have to say that HTML/CSS/JS abd frameworks on top of that are in par with the easiest of those for nontrivial cross-platform UIs, though the others may have other advantages.


If it's a question of UI only, Electron would be less bloated than a webserver? Arguably?


In a hypothetical organization where you have hundreds of more users needing to perform these tasks. Is it better to push and maintain software on hundreds of computers, or one server, that is probably multiple use to begin with. Containers are far easier than maintaining software across hundreds of devices.


I’ve seen this sort of reaction quite often from people who are/were used to native (desktop) applications and grew professionally in that paradigm. What happened is that there was a paradigm shift to web programming model, where a server and a browser interact and each has a specific role and a well defined interface to interact. Then to address software compatibility/versioning/configuration/predictability came the paradigm of containerization. Both paradigms are dominant now and that’s how younger programmers grew and still are growing professionally. The overhead the paradigm introduces in terms of used resources is an acceptable and affordable price to use the paradigm but is likely to seem overpriced for people used to earlier paradigms. A 1960s programmer may be appalled that a larger COBOL compiled binary is taking several times more memory than a hand coded Assembly equivalent. Next generation of programmers may be doing everything with tell chatGPT a long story of what they want it to build in terms of code.

Point is - paradigms shift, unacceptable prices become affordable. Generational gaps manifest themselves not just between parents and children.


Because PDF is a minefield and PDFtk does not solve all problems on all platforms. You'll learn that if you try to process millions of PDFs in the wild that may or may not comply with any of the numerous specs.


> A few KiloBytes native program like PDFtk

Is this binary statically linked? If not, I am sure the Tk libs are huge. Not as big as Chromium/Electron and friends, but large.


And for all that this does, it doesn’t seem to touch any accessibility remediation problems.


lol, you think sending TCP packets of your 25mb PDF to Google, so they can send it back, so you can send it to Google again in an email attachment, so they can send it to another Google server to another Google user, so that user can download it and upload it to Google, so they can print it on an 8.5" x 11" piece of paper is saving the planet?

You just sent how much wattage around the globe 100 times for what? To print the paper you already had on your screen?

You sound like the kind of people who put their very important network documentation on Google Drive, so when your network goes down you have no way to access the information required to bring it back up. I'd rather have one engineer who knows the ins-and-outs of a LAMP stack than 10 who only know how to provision cloud VMs.


How did you get "sending it to Google" from

> A few KiloBytes native program like PDFtk


So negative, just unbalanced and so negative.


It's funny to see this #1 on HN. I have a PDF converter site[0] that I did a show hn [1] years back, and have been currently pushing updates too as I work on a entire site redesign since the PDF niche is massive. I'm alleviated to see that some one actually made a package for PDF to OCR[2]. And that they are using it[3]. It will finally make what I was doing less hacky.

[0] https://www.pdf.to [1] https://news.ycombinator.com/item?id=23238862 [2] https://github.com/ocrmypdf/OCRmyPDF [3] https://github.com/Frooodle/Stirling-PDF#technologies-used


It's scary how such a widely used format (PDF) is almost in full control by Adobe. I have yet to see a true competitor to Adobe Acrobat. The only one that has come really close is the one that comes built-in for macOS. It's a hidden gem.


PDF has been a free-as-in-beer standard since 1993 and a free-as-in-speech ISO standard since 2008. The reality is that PDF is open, reliable, useful, feature rich, and widely accepted. It has no serious competitors.

There's a reason why, unlike raster image formats, there aren't any serious competitors. The thing to realise about printed page file formats is that even if you set aside all of the silly "multimedia" and "interactivity" features, there's still a gargantuan rabbit hole of non-trivial features that need to be implemented absolutely perfectly, from kerning to spot color. PDF does it all very well. There's really no scope for a competitor to come along to make something that's obviously better.


Especially since there's a PDF/A standard subset for archival, which makes those documents readable decades after any other format would rot.


> PDF/A standard

Yes, the United States Library of Congress makes heavy use. I imagine their evaluation process to select a digital archive format is very tough!


>PDF has been a free-as-in-beer standard since 1993 and a free-as-in-speech ISO standard since 2008.

Yes. The first sentence of the Wikipedia article about PDF is:

>Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems.

And the last sentence of the first paragraph of the same article is:

>PDF was standardized as ISO 32000 in 2008.[5] The last edition as ISO 32000-2:2020 was published in December 2020. .


PDF had an open spec and oodles of programs supporting it. I don't understand where this comment is coming from.


Adobe acrobat (and maybe reader) is really the only app that fully supports the full PDF spec as understood by the authors of the spec. There are ridiculous parts of the spec that allow support for things like JS, etc.


I've seen many third party PDF viewers; I think all supported JavaScript. It's commonplace, not 'ridiculous' at all.

> Adobe acrobat (and maybe reader) is really the only app that fully supports the full PDF spec

The full spec is large and afaik has many obscure pieces, including 3-D, etc. Like many specs, they don't match reality and nobody takes completeness too seriously. For almost all users, supporting the entire PDF spec doesn't matter (does it matter for any user - does any person or organization use the entire spec over their lifetimes?).

Also, do we know that Adobe supports the entire spec?


Yeah, even Adobe doesn't really use the full spec. Or at least didn't.

There's a fairly big chunk in the spec of special presentation attributes for slideshows. When I implemented them I was surprised that slide shows produced by Acrobat didn't work. Well, obviously my implementation was buggy.

Er, no, Adobe didn't use their own slide show attributes for the slide shows produced by Acrobat. They used JavaScript instead.

Oh well. ¯\_(ツ)_/¯


That is true, but I have never encountered a PDF that is not produced by me and cannot be faithfully represented in third party PDF readers. And I give up the idea of producing those kinds of PDFs because I know the people I send to will complain about me rather than their PDF readers. So, Adobe Acrobat doesn't have any monopoly power here, since almost no one cares about those things only they can do.


We often get PDF's that does not work in our pipeline and it's always blamed on the pipeline, not on the creating software. The user usually converts the PDF to an image with adobe reader and screenshot, load up Libreoffice, paste and export it as PDF archive.


So the PDF that does not work in your pipeline is created by LibreOffice rather than Adobe Acrobat? That doesn't seem to add any strength to the argument that "Adobe Acrobat has unusual powers because only it can handle the full spec of PDF".


No, you missread. The PDF that works is created by anything that does not use the full spec of new PDF versions. We have chosen Libreoffice because we already use it for other things. If we recreate the PDF in Libreoffice as PDF archive version it works just fine. The problem is usually a pamphlet created by some ad agency using the absolutely latest version of some layout program, neither adobe nor libreoffice. The PDF usually works just fine in Adobe but not in our pipeline that uses all sorts of linux programs to process into a JPG in the format and orientation our system needs. Noone has had the time or energy to fix it since most stuff works so for now it will be downsampled by a screenshot and just showed into the system. The added benefit is the PDF shrinks from 150 MB to 300 kB in the process.

Adobe Acrobat is the only thing that can handle all cases yes. All other programs uses (different) special cases each and most of them fail in some edge cases. It can be funny letters showing up because of fonts not working properly or images disapearing or all sorts of things. I have given up to fix them all. I still have a library of PDF's that we used to run through to try to get as many as possible to work.


> That is true, but I have never encountered a PDF that is not produced by me and cannot be faithfully represented in third party PDF readers.

That's because PDF is well designed and has a fallback for advanced page elements so more primitive readers can still render them.


What's wrong with using a sane subset of the spec aka PDF/A?


I don’t think it’s ridiculous to want a scriptable document, especially for complicated forms. Likewise for the other much-dragged features for 3d scenes.

https://pdfa.org/resource/pdf-in-manufacturing/ is a great usecase.


The PDF spec even supports attachments in PDF files.


How does that work exactly? Is it widely supported?

I recently had to add an embed feature to our pdf rendering, to allow users to embed other pdfs inside the one we generate for them. Since we use headless Chrome, I used pdfjs from mozilla to render the embedded pdf on screen before generating the pdf, so you can actually see and read the embedded pdf.

Works pretty well, but was wondering about this attachment feature of pdfs.


PDF is a container format and yoy can just shove files in there. pdftk supports this with attach_files, and at the very least the linux pdf readers I’ve used know how to deal with them.


TIL!

Since these pdfs end up in whatever and how old devices, I'd rather not risk it though.


good question. how it works is an implementation question. read the PDF spec (big, and somewhat hard to grok in parts) or Google about it. I don't know. I just knew about the fact, because I have done some work with PDF. also don't know if it is widely supported, sorry.


nice idea there.


Agreed. Also, possibly the most commonly used reader is pdf.js, the FOSS component used by at least some major web browsers.


I'd guess it's the fact that nothing supports editing PDFs except Adobe Acrobat. Not to any sufficient extent. LibreOffice Draw kinda try to do it but also corrupts the file IME.

Edit: Apparently some people can edit PDFs reliably with MacOS Preview.app, LibreOffice Draw or pdftk.


Where are people getting this information? There are many applications that support edting PDFs, more than I can remember.


Warning: Preview has a very, very nasty PDF-damaging bug. After applying a signature to a PDF in Preview, the searchable text layer becomes scrambled so that numbers are no longer searchable or copy-able. For example, “$745.25” would become something like “$;@€:-€“. The document would still look correct, but you could not search for that figure or copy it out of the PDF without instead getting the garbled version.



I find Chrome's built-in PDF viewer much snappier than Adobe Acrobat.


Sadly I had to install Adobe Reader on my father PC again after he had documents* with formulas. Chrome would calculate the numbers wrong. Everything was off by 10.

*To get reimburses from a union or something.


If you occasionally need Adobe Reader/Acrobat exclusive features but don't want to install, you can use the free online version of Acrobat. It's pretty decent though it doesn't have all the features:

https://acrobat.adobe.com/us/en/


I recently had issues with macOS‘ Preview.app and formulas. It’s a nice feature, but probably not widely supported.


Whoa, I had no idea PDFs can have formulas.


You can embed Javascript in PDFs



Adobe is expert at software standards. They aren't compulsive about control, yet don't give the farm away. The know when to be open and how much. That is how they dominate.


Could you give an example? PDF is an open standard controlled by ISO and has been for awhile.


It’s incredible to me that not only has Preview.app been the best non-Adobe way to use PDF’s for decades now and only on macOS (perhaps because NextStep, its roots, used PostScript natively?) but that Linux actually also seems to have better tooling in this space than Windows (where you’re pretty much stuck with Adobe Reader if you want a free solution)


Isn't SumatraPDF a decent program for Windows?

In regards to Preview, I still find it insane that it doesn't have an iOS/iPadOS equivalent. Bits of the functionality are scattered all over the place, usually in ways that don't feel as good as they do on the Mac. Sometimes I just want to open a PDF and leave it open, and not have to do it from Files which assumes I want to do something else with it than just looking.


I personally use SumatraPDF on Windows, but it's basically just a fantastic PDF viewer. It does little else in regards to editing/modifying PDFs. Even the PDF viewer in Edge does more.

But for a lightweight, bloat-free experience, SumatraPDF is the way to go.


> Preview.app been the best non-Adobe way to use PDF’s for decades now

Where is all this stuff coming from? Why would you say Preview is the best? Foxit? Nitro? Their are endless PDF applications much more powerful and capable, some designed for professionals.


"Best" isn't necessarily the same as "has the most features".

I think many people find that Preview.app does everything they ever wanted to do with PDFs. It really is surprisingly capable. It's also fast and far less convoluted than most PDF tools I have seen.

And of course it comes free with every Mac, which often makes it "best" in terms of value for money.

It doesn't help that many PDF editors (including the two you mentioned) are full of the most ridiculous pricing shenanigans.


> many PDF editors (including the two you mentioned) are full of the most ridiculous pricing shenanigans.

What shenanigans? On one computer, we have a 15 year old version of Foxit running as good as new, no further licensing needed.


Pages and pages of dark patterns with the sole purpose of misleading people into buying some "plan" that nobody could possibly want at prices more expensive than the entire Microsoft Office suite.


Could you name a service that does it and give an example?

I see that much more with Adobe than with anyone, fwiw.


The Foxit PDF Editor product page is one example (not the worst by far). It suggests prominently that you have to buy an annual subscription ($109 to $159 p.a unless you can live with the cloud option for $59 p.a). Microsoft Office 365 Personal is $69.99 p.a including 1TB cloud storage.

https://www.foxit.com/pdf-editor/

The one-time license option is hidden away in a product comparison table (and linked to in a few other far less visible places).

Nitro, the other PDF editor you mentioned, appears to offer only a one-off purchase:

https://www.gonitro.com/pricing

But it says "for Windows" and at the top of the page, there's a promotion saying "Get up to 1 year subscription - free when you switch to Nitro". So there is a subscription after all?

If you keep scrolling down to the FAQ and there's a question asking:

"Is Nitro available as a subscription or a one-time purchase?

Nitro Pro, ideal for individuals and small to medium sized teams, is available as an annual subscription."

No mention of a one-time purchase option. So which is it? I'm confused. Is this "one-time purchase" a perpetual license or does it stop working after a year?

These are certainly not the most egregious examples of pricing shenanigans. But given the recent history of companies going subscription-only, this is enough uncertainty for me not to buy.


For annotating and adding a hand drawn signature to PDFs, preview really is the best: lightweight, straightforward, free, comes with the OS. I don’t know any comparable app for Linux (or windows although I rarely use it)


https://simplePDF.eu will be the closest « Preview-like » experience on Linux (and any OS really).

It’s local only (the document you load and data you fill in never leave the browser) and free

Disclosure: I’m the developer behind it


Any plans on implementing redaction?


If the background is white, it’s already possible: https://simplepdf.eu/help/faq/how-to-add-background

For other colors, it’s in the backlog!

(I’d aiming to automatically provide the correct color by inspecting pixels within the area to redact, to make it as simple as possible)


Yeah but hiding the information versus removing it is two different things. By hiding it, the information is still there but hidden.


What a bizarre comment - Preview.app isn't even the best PDF software on Apple platforms (it's more likely something like Readdle's PDF Expert).


There are of course better softwares than the Preview.app, but that's if you are willing to pay for it. Preview.app is free.


> perhaps because NextStep, its roots, used PostScript natively?

And OS X and its successors use Display PDF natively, which is why it is trivial to save almost anything that can be displayed into a PDF file. The PDF stack that Preview.app leverages is a foundation of the OS itself.


Does that one have a name? I’m a MacOS transplant and have never gotten terribly familiar with the territory. Thanks!


Preview.app which owes its heritage to NextStep. https://en.wikipedia.org/wiki/Preview


... ah, thanks, I misread. I thought GP was talking about a document format in the same space as PDF that was native to MacOS.


Preview is the app on macOS that lets you view and even edit PDF files.


preview.app


Foxit Phantom is pretty good


It costs, I might aswell pay for Acrobat.


What if Foxit works better and costs less?


That’d be interesting but I’m not the right audience. In which ways is it better than Acrobat?


IME: Much faster, far better UI. (Foxit is one option of many.)


What about Okular?


There is no support for redacting text properly on Okular.


I'm surprised no one mentioned LibreOffice Draw - it doesn't always work perfectly (I guess it doesn't support some parts of the spec), but when it does, it's by far the most powerful pdf editor I found allowing to do things like move elements around, edit them (as in actually edit, not just annotate), etc. It's cross platform and FOSS.

For page-level edits (rotating, reordering etc) pdftk in the cli (+ChatGPT to find the right incantations) works very well.


Yeah, libreoffice draw and inkscape are usually my go to tools for editing pdf.


inkscape works well for single pages.

The problems with PDFs I encounter, however, are large scale 1000 page PDFs that compile PDFs from multiple sources that clearly have multiple different types of encodings, fonts, etc.

I'd love to have a pipeline that properly 'shrinks' everything. Not sure thats what this thing does, but it looks like they're moving towards configuring pipelines that could get there.


Have you tried FileOptimizer (https://nikkhokkho.sourceforge.io/static.php?page=FileOptimi...) for shrinking their size?


https://tools.pdf24.org/en/creator

This tools is not open source, but it’s free. Files should remain on local pc. Developers claim that they make money only by advertisement on their website.


Great tool. The stirling looked exactly the same, except on a server.

I wonder why it's not open source by now.


Wow, it covers almost anything, including redacting text.


How easy or difficult would it be to turn this into an electron app so that non-technical users can use it easily too?


Dev here, totally could, we dismissed it at first as electron is quite bulky containing a whole chromium instance inside the exe. instead we kept it small as possible for the exe version Truth is its not to hard to port to electron We have plans for a full UI version in V2. We are releasing V1 (SPDF is currently in beta) sometime this month. But have begun work on a V2 port to different language and framework.


Quasar.dev provides a full "interoperable" solution to get to electron and others. All the code can be written as regular Vue3, then built for:

- SPAs (Single Page App)

- SSR (Server-side Rendered App) (+ optional PWA client takeover)

- PWAs (Progressive Web App)

- BEX (Browser Extension)

- Mobile Apps (Android, iOS, …) through Cordova or Capacitor Multi-platform Desktop Apps (using Electron)

Might be worth considering if you're going full client.


Better use existing applications like PDFsam [0] or PDF-XChange [1].

[0] https://pdfsam.org/pdfsam-basic/

[1] https://pdf-xchange.eu/pdf-xchange-editor/


Why “better use some to big else”?


It would be nice if it were WASM-based. Then someone could host that version of the app and it'd still be local processing.


Why Electron? Better go the lightweight way with Tauri, NodeGUI or DeskGap.


Any recommendations for a desktop/cli PDF optimization tool that will reduce the size of a pdf? I've tried few and the best one so far is the one that is included in the subscription version of Adobe Acrobat. But I only need it occasionally and is not worth paying $20/month sub.


I recently found out Mac's built in Preview app can do this. Go to export then change the Quartz Filter to "Reduce File Size"



If you're on a Mac, (my) PdfCompress is fairly smart about doing a good job.

Haven't updated it in while though... :-/


Did you already give ghostscript a try?


Ghostscript maybe? Depends on what you’re doing but it can downsample images etc.


Yes, its mainly to reduce image size for scanned documents. I'll give Ghostscript a try.


Something like this is probably a good starting point:

    ghostscript -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/printer -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf
From this gist https://gist.github.com/guifromrio/6390547#


Very helpful tip. Thank you! - Ran it through GS and I got 30% reduction in a 50mb pdf file. I think if I play around with some options - such as converting images to grayscale, I might be able to reduce it by another 10-20%.


I guess it depends on your document, but I'm surprised you only get that much compression. For scanned to pdf documents I often get orders of magnitude.

I'm not at my computer, but try messing with the `/printer` in the above command, there are other options, (possibly `/ebook`?) that control the compression ratio from memory.


I have a PDF problem that I thought was simple but has proven difficult to solve and there is no paid solution I’ve found…

I want to forward an email to an inbox, have the email body converted into a PDF, and then email that attachment to someone all automatically. I’ve tried Make, Zapier, pdf.co, pdftool, and a few other tools but have had no success. Has anyone solved this problem reliably?


If you are able to code or can ask someone, then you should be able to do it with some email api service (Nylas, AWS SES, etc) or headless client that gets the body of the email and convert it to pdf using wkhtmltopdf and then send it as attachment using the same service as before.

Using low/no code tools might be very hard/unlikely


Thanks, yes I think this is the right direction. Surprised it doesn’t exist as SAAS, I guess demand isn’t there.


I'm pretty sure you can do this with a Office365 & their automation stack Power Automate.

Obviously it's only an option if your org has already sunk deep into Microsoft-of-things (MoT) universe.


If you want the pdf to look anything like the email, you will need to render it in a browser and capture a pdf. It’s not particularly hard if you know what you’re doing.


Any libs to help with that? Thanks.


A headlees browser like e.g. puppeteer would do the job. I use it a lot for exatly that purpous...



Thanks for sharing!


I needed this for expensing receipts that come via email. I created an API for it where you POST the email to an endpoint and get back a PDF.

Email me and I’ll give you access for free.


I use Google apps script for it.

A filter labels a specific email.

A timed trigger runs a script.

Script fetches all emails with that filter.

Script runs in loop. Convers each message into a blob, blob gets converted to pdf. Pdf gets saved in google drive. Email gets label removed.

My code was based on this https://www.labnol.org/code/19117-save-gmail-as-pdf


It seems quite doable but you'd need scripting skills to set it all up. Read the incoming queue, pass it to wkhtmltopdf then pipe the result to the mail command. For windows I believe I once used a java smtp server (apache james) that allowed you to set custom code as an incoming email handler. After that the conversion and email sending is trivial.


If you have access to O365 (or whatever it is called this week) this would be easy to do using powerautomate


Probably depends on the purpose of the pdf and why it needs to be an attachment, but I'd just skip all the steps and print the email since that's more or less what pdf is for. Print it and re-attach or just print at the destination.


This is what I currently do. I was just hoping to automate the process.


I think Mail.app in macOS can do that with Automator. At the very least, the PDF emails coming from an email address and forward as attachment bits.


Anecdotal bug warning:

Mail.app can "Export as PDF" from the File menu, but I noticed on 13.6 that it exports blank pages if the email is plain text only.

I had to choose to print the emails and then save as PDF from the print dialog.


I did something like this 10 years ago as an internal tool for a company. BAck then I did it with Outlook VBA.


If you need to send an email anyway , why not print to pdf and email the pdf?


Google Apps Script can do all of this. Take the email body and put it into a Google doc, then export the doc as a pdf to drive and attach it from there to send.


I still couldn't find a tool for a difficult problem to solve. I have some magazines in PDF, with layouts in two columns, etc. I want them to be transformed into Markdown. I know, it should identify automatically the two columns, different layouts, etc.

I am not desiring something perfect - I can fix if ther are some errors, but so far nothing has come with a good result.


I use Briss for this: https://github.com/mbaeuerle/Briss-2.0

It overlays all the pages on top of each other, you the human draws rectangles around the stacked columns in the easy GUI, and then it processes them into pages.


Have you tried this (for at least solving part of the problem)?

https://github.com/pdfcpu/pdfcpu


This can be arbitrarily difficult to do, depending on the PDF. This is generally called PDF reflowing. Another approach is to use column-aware OCR software.


Have you considered marker. Does a very good job of turning PDF into markdown. - https://github.com/VikParuchuri/marker


This is a hard problem. Cut the PDF down so it's only the pages of the article you want and then try feeding it through GPT Pro or Claude?


There's this: https://www.cp.eng.chula.ac.th/~somchai/cut2col/ Free and it works pretty well.


I bet GPT vision will do that like cutting a piece of cake. It'll even do the OCR for you and organize the text nicely.


Depending on how much you're willing to pay, the OpenAI GPT-Vision API can definitely do this extremely well.


I am willing to pay for this, I only have around ~80 files, with 30 pages each (average). Is there a quick way to test this without wasting too much on the code part?


Yes, ChatGPT will do it if you upload photos of the page.


I just tried it with the API and it worked better than expected. Now I need to find to convert PDF to JPG with an API, and find the best prompt to ask GPT to only convert to markdown articles and not pages with columns, ads, etc. Thanks!


No problem! There are many Linux programs (some in this thread) to convert a pdf to images, and the prompt will hopefully not be too hard.

I started making app to read our board game cards out loud (with voice) for our horror board game nights (https://boardguru.net) and GPT-4 could read cards that I couldn't make out!


>Now I need to find to convert PDF to JPG with an API

You can do that very easily with a locally installed ImageMagick. ChatGPT can help with the commands needed, but should be just one to convert a PDF to a number of JPGs and a small shell script to run on all your files.


The built-in apple preview application does exactly what you want..

It just looses bold etc.


In case you are looking for a simple entirely in-browser PDF editor, I just released PrivatePDF - https://github.com/photown/private-pdf

It supports basic operations such as filling forms, inserting text, images, etc.

You can test it out with a sample PDF here - https://photown.github.io/private-pdf/?pdf=https://raw.githu...


Can it add attachments to pdf files?. Until this year I did not even know that this was possible but a government agency asked me to add files as attachments to a pdf as their website only allowed uploading valid pdf files.


Have learned about it this year as well


You can use Acrobat Reader for that.



What about pdftk?


pdftk has an issue where it corrupts pdfs occasionally in modern windows server versions. Had a weird bug in a random helper service at work that we narrowed down to pdftk mangling documents sometimes. Never looked much into it since it was only a couple of hours to replace it with another tool, and haven't had issues since. I think all we used it for was merging and adding watermark text.


"replace it with another tool" which tool was that ?


Nice. I've been looking for something like this to self-host, to avoid my partner uploading sensitive documents to random PDF manipulation websites.

Any better alternatives I should be considering?


If you happen to be on macOS, the Preview app does an absurd number of things to PDFs, and it does it well. To be honest I'm always surprised it isn't highlighted more by Apple, it's a great tool that pretty much always just works. You can split files, join them, rotate, add signatures, drawings, annotations, redact sections, etc. The feature list is long, especially considering that by the name of the application you'd think it could just preview files, not edit them.


> to be honest I'm always surprised it isn't highlighted more by Apple

Probably because its not so intuitive, I have to google how to use some of the advanced features of Preview.


I often use PDF Sam (basic) and usually it works quite well and is offline.

https://pdfsam.org/


A really nice, stand-alone command line tool is pdfcpu.

https://github.com/pdfcpu/pdfcpu


Looks great, but my partner needs something more convenient.

It needs to be web based and work on desktop/mobile.


You can simply use poppler-utils on your on computer? It's a collection of commandline tools for PDF-manipulation. More information can be found here: https://pypi.org/project/poppler-utils/


KDE’s Okular. Works on Linux, Windows and macOS.

If you’re on already macOS, Preview already has you covered.


Does it need to be a self hosted web based tool or do you just need PDF software? If the latter I find PDF Expert to be powerful and nice to use.


I use pdftool.org which I saw on HN a while back


Edge is surprising decent for marking up PDFs.


What I have mainly have been looking for in the free software ecosystem is a good tool to work with PDF tagging/structure/element attributes.

At work I really have only been able to do the work I need on random PDFs with Adobe Acrobat. It seems strange that this is the case as PDF is now an open standard.


LibreOffice Draw can do that (not sure about tagging).


Bluebeam PDF has an amazing Stapler tool. I can have a job that combines various pages of various PDF files and does a few other operations on them. When time for print comes, I run the job and output a PDF. For a kind of work that has to frequently put together various pages of various PDFs repeatedly to send as draft for review, this is a tool that makes life easy. https://support.bluebeam.com/articles/revu-21-revu-configure...

However, I have yet to find an equivalent tool from any other PDF application. And that includes this one.


Looks like the pdf-lib.js used by the project can do most of the advertised features right in the browser and there is even a wasm build of tesseract out there.

Have you considered making serverless/browser-only version?


Yes our v2 version we are working on is this! We plan to completely migrate functionality to be all client with a server side one for API requests as well


No one commented yet on how this entire app was built by ChatGPT?


Yep. That's the most interesting thing here. OP can you elaborate on how you got it to develop a full fledged application?


So the whole app is not made in chatgpt It started like that 11 months ago though yes I made the website and 7 pdf operations with chatgpt as a test to investigate chatgpts power and applications Everything after that has been manual though and basically all the code has been changed by now


Smallpdf [1] probably deserves a mention here. Not OSS and not self-hosted, but I‘ve used it occasionally and it has always worked really well. When I was running an agency, we inherited their first office – very cool folks.

[1] https://smallpdf.com/


damn, that's a huge team

https://smallpdf.com/about


I'm not surprised... I mean just the the specification for PDF 1.7 is what ~1300 pages ?

And then there is 2.0, and all the extensions [1]

And multiply that with the number of implementations.

If the goal is to make something that "always" works, you probably need a big team to keep up with the moving field of various bugs and reimplementations

[1] https://www.loc.gov/preservation/digital/formats/fdd/fdd0000...


I did a small only-front app for that, it's open source if you want to check it (disclaimer: im not a front end dev, the ui is not good) https://timothebarbe.github.io/pdfModer/


As for self-hosted web apps, Tabula (https://tabula.technology) is a great tool to extract tables from PDF files.


I'm looking for a tool other than adobe pdf reader that lets me upload an image file of my signature to sign a pdf. Most of the tools I found let me draw a signature and I can't draw my signature on a track pad or with the mouse


I use GIMP for that, and if I need it to look like printed-signed-scanned some ImageMagick incantation [0] or https://lookscanned.io/

[0] https://news.ycombinator.com/item?id=30024658


I made myself such a tool: https://pdf.rere.re

It's js only, nothing is sent to the server. It automatically makes the background of signatures transparent. The result is a raster pdf as if you printed, signed and scanned the document. I use it on desktop, not sure if it works well on phones.


PDF Expert is the best I’ve found that also does this, and while expensive, is a really robust and well done program. PDFpen also has this ability.


Xournal++ does what you are looking for.


Firefox can actually do this as long as your signature is in an image file.


I’ll join some other commenters, to add my favorite difficult pdf problem that I haven’t found a ready to use (even paid) solution for: extract key value pairs from a filled form such as this medical claims form:

https://imgur.com/a/EJDi7L7

There are two levels of difficulty: the starting file could be an image (pdf or png or jpg), which is the most difficult scenario. The slightly easier one is where it’s a text-based pdf so no OCR is needed.

I threw this as an image file at google form parser but it did poorly, I.e missed quite a few fields.


Dev here for the above stirling pdf app, Please raise features like this as a feature request github issue ticket and we can try address it in future!


I would do exactly what you have done here if I were the dev of the said app. But with the luxury of being an outsider, a user has expressed an inconvenience and it seems to make sense, then if I were to be the dev of the app here, wouldn't I go and create the ticket in whatever system with a link to this post instead of asking the user of the app to follow the red tape? I know there are places where this is not incentivised so this is a question for your org and not for you.


I see what you're saying and for simple features I agree However Without the OP creating the ticket there can be no feedback look on the feature. If i wanted it tested for their usecase, there input and confirmation on if its what they wanted and improvements for the workflow etc.. If I base the whole feature on this comment it could end up only doing half a job. Id rather have that communication loop open!


I tend to agree. As an open source dev myself, I avoid asking folks to create issues, as it puts a burden on the user. I’ve see some highly respected open source leads so this, and I’m not faulting them, as I think they’re coming from a good place; it may be a difference of opinion on what’s best practice.


Not OP. My take is that if the requester can’t be bothered to create a GH issue, it’s likely that this isn’t really a problem for them. An annoyance possibly but has not risen to “pain” levels.


This is open source software sir, it needs multiple steps to ensure users actually need these features and are willing to use them.



Their scummy website doesn’t list their prices in any way I can see. Hard pass.


Have you tried Azure AI Document Intelligence?

In theory it's exactly this...


I second this, that or have you tried GPT-4 Vision or Donut?


Still waiting for GPT4V but doubt it will do this. Yes I’ve tried Donut and other options but this is a very gnarly problem.

One option is to extract text blocks along with their coordinates (unstructured.io gives this, probably based on another pkg because it’s basically a container for many pigs). Then do the same with a blank template, and you then have an algorithmic problem of matching the filled values spatially with the key locations from the template.


I'm fairly confident GPT-4V will do this just fine, tbh.

You just need to extract each of the elements into a structured JSON or something, right?

I'll try with your example later today.


Exactly, the form has filled values in named cells, so we need a JSON of cellName -> filledValue mappings.

Let me know how GPT-4V does!


I second trying GPT-4 Vision, though they have dumbed it down a bit since launch.


As long as I can fill in forms and add a signature I'm sold. I loved being able to do this on MacOS but now that I'm on Linux I still haven't found any app that can do this.


I just released a minimal browser-based PDF editor intended for your exact usecase - https://github.com/photown/private-pdf

Let me know what you think!


Firefox can actually do this as long as your signature is in an image file.


I have not looked into this yet but can someone recommend an application for repairing pdfs? For example, I have PDFs where selecting text highlights a line above or below.


That doesn’t sound like the PDF is broken, just that it uses unusual font metrics or line displacements. Tools that could amend this are unlikely to exist.

More generally, the PDF format is too flexible to decide what is “broken” or really is as intended, in many cases. It’s l a bit like asking for a tool that repairs “broken” source code where it’s really just the business logic that is broken.


Some ocr no?


Try converting it to PDF/A


That's not it.


Self hosted sites are pretty awesome. Love seeing these here.


Does anybody know of a similar foss suite of pdf tools that runs as a static site only using local javascript? I would prefer that to something like this.


Does it support adding / managing named form fields?


dev here, Not currently but its a planned feature


This is really neat! Is there a paid product backing this or planned for the future? I'm curious what motivates all the dev hours going into it.


No paid backing, just running on donations at the moment. I am tempted to try add a paid feature for AI integration or something or some high end office features as I have a fair few offices that use this software now. But to be honest I would always want it free and it's just been a hobby


it says this started as a 100% chatGPT project!


What does it mean?


From my understanding they mean the code was generated by instructing OpenAI’s ChatGPT (contrary to writing the code themselves).


Can this add paragraph numbers? I see page numbers in the README, but nothing on how I can number paragraphs.


Would be really interesting to use this with paperless-ngx to annotate or watermark documents.


What seems to be missing is an OSS tool to add/remove form fields


Are there any plans to offer a CLI, as well?


No there is a API instead, no CLI planned


Looks cool! ty! Checking it out!


reminds me of smallpdf.com but open source


Probability density functions, presumably. Oh, partial differential equations?

For the document files, I love PDF Studio: https://www.qoppa.com/pdfstudio/


Why can't this be an electron app?


Dev here, totally could, we dismissed it at first as electron is quite bulky containing a whole chromium instance inside the exe. instead we kept it small as possible for the exe version We have plans for a full UI version in V2. We are releasing V1 (SPDF is currently in beta) sometime this month. But have begun work on a V2 port to different language and framework.


As an alternative you could write some automation scripts to handle all the requirements for self hosted install. If you look at oogaboogas text-generation-webui [0] you can see what I mean. Although ease of install also leads to a larger user base, which can be a double edged sword for something as ubiquitous as a pdf app. It's much easier to get people to submit issues than to help solve them.

[0] eg, windows install script https://github.com/oobabooga/text-generation-webui/blob/main...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: