Um. I work with PDFs a LOT and.. nah PDFtk is pretty weak. It doesn't even do OCR!
This looks like a wonderful tool that solves a lot of problems that existing tools don't typically put together. It's an achievement, and your post is unnecessarily crass.
This is very much the sort of tool that you host internally at a newsroom so journalists don't have to wrangle with software or write code. Like, in that situation, who cares if it's on top of docker? The users definitely won't give a shit.
Please consider that you're not the target audience....
It may not be right for you, but I can see situations where this would be preferable.
If because of your job you find yourself doing these operations very often, and the ability to do them from several devices with different OSes is valuable, it might be great to throw this on a server.
Or if your work has several people, may be not very technical folks, do them often. I’ve worked in a couple of places where this could’ve come in handy.
Also, it includes an API. Also, being open source, if in the future you’re creating a web app that needs some of these features, you could learn from/copy from its code.
I think this is a good contribution to the world.
> I don't understand what is the point of bloating softwares like this. Not even speaking of the very bad consequences for the planet.
I partially share this concern. I wouldn’t deploy this for myself unless I had a very easy way to stop it and start it, but anyway while not in use it it should only be consuming a bit of RAM, and there’s plenty of very efficient hardware suitable for small servers these days.
A web app makes it cross platform. If you have a homelab, deploy it only once for every client.
And PDFtk doesn’t do annotations afaik which is a huge pain point on Linux (at least for me) because there are no applications that I know of to easily do things that are trivial on OSX like adding text or hand drawn signatures to PDFs. Masterpdf can do it but with a watermark and some limitations.
Maybe it doesn’t suit your particular use case but I wouldn’t say pdftk can replace this project.
Because it's a simple (for me as the user) and reliable way to get a UI without having to send my personal files to some server somewhere. What's the big deal?
No, there are easier and more lightweight ways to create a cross-platform UI than to write HTML/CSS/JS: Tkinter, wxPython, PySide6, MiniGUI, Dear ImGui, Nuklear, React Native.
Having used most of those at lrast trivially. i'd have to say that HTML/CSS/JS abd frameworks on top of that are in par with the easiest of those for nontrivial cross-platform UIs, though the others may have other advantages.
In a hypothetical organization where you have hundreds of more users needing to perform these tasks. Is it better to push and maintain software on hundreds of computers, or one server, that is probably multiple use to begin with. Containers are far easier than maintaining software across hundreds of devices.
I’ve seen this sort of reaction quite often from people who are/were used to native (desktop) applications and grew professionally in that paradigm.
What happened is that there was a paradigm shift to web programming model, where a server and a browser interact and each has a specific role and a well defined interface to interact.
Then to address software compatibility/versioning/configuration/predictability came the paradigm of containerization.
Both paradigms are dominant now and that’s how younger programmers grew and still are growing professionally.
The overhead the paradigm introduces in terms of used resources is an acceptable and affordable price to use the paradigm but is likely to seem overpriced for people used to earlier paradigms.
A 1960s programmer may be appalled that a larger COBOL compiled binary is taking several times more memory than a hand coded Assembly equivalent.
Next generation of programmers may be doing everything with tell chatGPT a long story of what they want it to build in terms of code.
Point is - paradigms shift, unacceptable prices become affordable.
Generational gaps manifest themselves not just between parents and children.
Because PDF is a minefield and PDFtk does not solve all problems on all platforms. You'll learn that if you try to process millions of PDFs in the wild that may or may not comply with any of the numerous specs.
lol, you think sending TCP packets of your 25mb PDF to Google, so they can send it back, so you can send it to Google again in an email attachment, so they can send it to another Google server to another Google user, so that user can download it and upload it to Google, so they can print it on an 8.5" x 11" piece of paper is saving the planet?
You just sent how much wattage around the globe 100 times for what? To print the paper you already had on your screen?
You sound like the kind of people who put their very important network documentation on Google Drive, so when your network goes down you have no way to access the information required to bring it back up. I'd rather have one engineer who knows the ins-and-outs of a LAMP stack than 10 who only know how to provision cloud VMs.
It's funny to see this #1 on HN. I have a PDF converter site[0] that I did a show hn [1] years back, and have been currently pushing updates too as I work on a entire site redesign since the PDF niche is massive. I'm alleviated to see that some one actually made a package for PDF to OCR[2]. And that they are using it[3]. It will finally make what I was doing less hacky.
It's scary how such a widely used format (PDF) is almost in full control by Adobe. I have yet to see a true competitor to Adobe Acrobat. The only one that has come really close is the one that comes built-in for macOS. It's a hidden gem.
PDF has been a free-as-in-beer standard since 1993 and a free-as-in-speech ISO standard since 2008. The reality is that PDF is open, reliable, useful, feature rich, and widely accepted. It has no serious competitors.
There's a reason why, unlike raster image formats, there aren't any serious competitors. The thing to realise about printed page file formats is that even if you set aside all of the silly "multimedia" and "interactivity" features, there's still a gargantuan rabbit hole of non-trivial features that need to be implemented absolutely perfectly, from kerning to spot color. PDF does it all very well. There's really no scope for a competitor to come along to make something that's obviously better.
>PDF has been a free-as-in-beer standard since 1993 and a free-as-in-speech ISO standard since 2008.
Yes. The first sentence of the Wikipedia article about PDF is:
>Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems.
And the last sentence of the first paragraph of the same article is:
>PDF was standardized as ISO 32000 in 2008.[5] The last edition as ISO 32000-2:2020 was published in December 2020.
.
Adobe acrobat (and maybe reader) is really the only app that fully supports the full PDF spec as understood by the authors of the spec. There are ridiculous parts of the spec that allow support for things like JS, etc.
I've seen many third party PDF viewers; I think all supported JavaScript. It's commonplace, not 'ridiculous' at all.
> Adobe acrobat (and maybe reader) is really the only app that fully supports the full PDF spec
The full spec is large and afaik has many obscure pieces, including 3-D, etc. Like many specs, they don't match reality and nobody takes completeness too seriously. For almost all users, supporting the entire PDF spec doesn't matter (does it matter for any user - does any person or organization use the entire spec over their lifetimes?).
Also, do we know that Adobe supports the entire spec?
Yeah, even Adobe doesn't really use the full spec. Or at least didn't.
There's a fairly big chunk in the spec of special presentation attributes for slideshows. When I implemented them I was surprised that slide shows produced by Acrobat didn't work. Well, obviously my implementation was buggy.
Er, no, Adobe didn't use their own slide show attributes for the slide shows produced by Acrobat. They used JavaScript instead.
That is true, but I have never encountered a PDF that is not produced by me and cannot be faithfully represented in third party PDF readers. And I give up the idea of producing those kinds of PDFs because I know the people I send to will complain about me rather than their PDF readers. So, Adobe Acrobat doesn't have any monopoly power here, since almost no one cares about those things only they can do.
We often get PDF's that does not work in our pipeline and it's always blamed on the pipeline, not on the creating software. The user usually converts the PDF to an image with adobe reader and screenshot, load up Libreoffice, paste and export it as PDF archive.
So the PDF that does not work in your pipeline is created by LibreOffice rather than Adobe Acrobat? That doesn't seem to add any strength to the argument that "Adobe Acrobat has unusual powers because only it can handle the full spec of PDF".
No, you missread. The PDF that works is created by anything that does not use the full spec of new PDF versions. We have chosen Libreoffice because we already use it for other things. If we recreate the PDF in Libreoffice as PDF archive version it works just fine. The problem is usually a pamphlet created by some ad agency using the absolutely latest version of some layout program, neither adobe nor libreoffice. The PDF usually works just fine in Adobe but not in our pipeline that uses all sorts of linux programs to process into a JPG in the format and orientation our system needs. Noone has had the time or energy to fix it since most stuff works so for now it will be downsampled by a screenshot and just showed into the system. The added benefit is the PDF shrinks from 150 MB to 300 kB in the process.
Adobe Acrobat is the only thing that can handle all cases yes. All other programs uses (different) special cases each and most of them fail in some edge cases. It can be funny letters showing up because of fonts not working properly or images disapearing or all sorts of things. I have given up to fix them all. I still have a library of PDF's that we used to run through to try to get as many as possible to work.
I don’t think it’s ridiculous to want a scriptable document, especially for complicated forms. Likewise for the other much-dragged features for 3d scenes.
How does that work exactly? Is it widely supported?
I recently had to add an embed feature to our pdf rendering, to allow users to embed other pdfs inside the one we generate for them. Since we use headless Chrome, I used pdfjs from mozilla to render the embedded pdf on screen before generating the pdf, so you can actually see and read the embedded pdf.
Works pretty well, but was wondering about this attachment feature of pdfs.
PDF is a container format and yoy can just shove files in there. pdftk supports this with attach_files, and at the very least the linux pdf readers I’ve used know how to deal with them.
good question. how it works is an implementation question. read the PDF spec (big, and somewhat hard to grok in parts) or Google about it. I don't know. I just knew about the fact, because I have done some work with PDF. also don't know if it is widely supported, sorry.
I'd guess it's the fact that nothing supports editing PDFs except Adobe Acrobat. Not to any sufficient extent. LibreOffice Draw kinda try to do it but also corrupts the file IME.
Edit: Apparently some people can edit PDFs reliably with MacOS Preview.app, LibreOffice Draw or pdftk.
Warning: Preview has a very, very nasty PDF-damaging bug. After applying a signature to a PDF in Preview, the searchable text layer becomes scrambled so that numbers are no longer searchable or copy-able. For example, “$745.25” would become something like “$;@€:-€“. The document would still look correct, but you could not search for that figure or copy it out of the PDF without instead getting the garbled version.
Sadly I had to install Adobe Reader on my father PC again after he had documents* with formulas. Chrome would calculate the numbers wrong. Everything was off by 10.
If you occasionally need Adobe Reader/Acrobat exclusive features but don't want to install, you can use the free online version of Acrobat. It's pretty decent though it doesn't have all the features:
Adobe is expert at software standards. They aren't compulsive about control, yet don't give the farm away. The know when to be open and how much. That is how they dominate.
It’s incredible to me that not only has Preview.app been the best non-Adobe way to use PDF’s for decades now and only on macOS (perhaps because NextStep, its roots, used PostScript natively?) but that Linux actually also seems to have better tooling in this space than Windows (where you’re pretty much stuck with Adobe Reader if you want a free solution)
In regards to Preview, I still find it insane that it doesn't have an iOS/iPadOS equivalent. Bits of the functionality are scattered all over the place, usually in ways that don't feel as good as they do on the Mac. Sometimes I just want to open a PDF and leave it open, and not have to do it from Files which assumes I want to do something else with it than just looking.
I personally use SumatraPDF on Windows, but it's basically just a fantastic PDF viewer. It does little else in regards to editing/modifying PDFs. Even the PDF viewer in Edge does more.
But for a lightweight, bloat-free experience, SumatraPDF is the way to go.
> Preview.app been the best non-Adobe way to use PDF’s for decades now
Where is all this stuff coming from? Why would you say Preview is the best? Foxit? Nitro? Their are endless PDF applications much more powerful and capable, some designed for professionals.
"Best" isn't necessarily the same as "has the most features".
I think many people find that Preview.app does everything they ever wanted to do with PDFs. It really is surprisingly capable. It's also fast and far less convoluted than most PDF tools I have seen.
And of course it comes free with every Mac, which often makes it "best" in terms of value for money.
It doesn't help that many PDF editors (including the two you mentioned) are full of the most ridiculous pricing shenanigans.
Pages and pages of dark patterns with the sole purpose of misleading people into buying some "plan" that nobody could possibly want at prices more expensive than the entire Microsoft Office suite.
The Foxit PDF Editor product page is one example (not the worst by far). It suggests prominently that you have to buy an annual subscription ($109 to $159 p.a unless you can live with the cloud option for $59 p.a). Microsoft Office 365 Personal is $69.99 p.a including 1TB cloud storage.
But it says "for Windows" and at the top of the page, there's a promotion saying "Get up to 1 year subscription - free when you switch to Nitro". So there is a subscription after all?
If you keep scrolling down to the FAQ and there's a question asking:
"Is Nitro available as a subscription or a one-time purchase?
Nitro Pro, ideal for individuals and small to medium sized teams, is available as an annual subscription."
No mention of a one-time purchase option. So which is it? I'm confused. Is this "one-time purchase" a perpetual license or does it stop working after a year?
These are certainly not the most egregious examples of pricing shenanigans. But given the recent history of companies going subscription-only, this is enough uncertainty for me not to buy.
For annotating and adding a hand drawn signature to PDFs, preview really is the best: lightweight, straightforward, free, comes with the OS. I don’t know any comparable app for Linux (or windows although I rarely use it)
> perhaps because NextStep, its roots, used PostScript natively?
And OS X and its successors use Display PDF natively, which is why it is trivial to save almost anything that can be displayed into a PDF file. The PDF stack that Preview.app leverages is a foundation of the OS itself.
I'm surprised no one mentioned LibreOffice Draw - it doesn't always work perfectly (I guess it doesn't support some parts of the spec), but when it does, it's by far the most powerful pdf editor I found allowing to do things like move elements around, edit them (as in actually edit, not just annotate), etc. It's cross platform and FOSS.
For page-level edits (rotating, reordering etc) pdftk in the cli (+ChatGPT to find the right incantations) works very well.
The problems with PDFs I encounter, however, are large scale 1000 page PDFs that compile PDFs from multiple sources that clearly have multiple different types of encodings, fonts, etc.
I'd love to have a pipeline that properly 'shrinks' everything. Not sure thats what this thing does, but it looks like they're moving towards configuring pipelines that could get there.
This tools is not open source, but it’s free.
Files should remain on local pc.
Developers claim that they make money only by advertisement on their website.
Dev here, totally could, we dismissed it at first as electron is quite bulky containing a whole chromium instance inside the exe. instead we kept it small as possible for the exe version
Truth is its not to hard to port to electron
We have plans for a full UI version in V2. We are releasing V1 (SPDF is currently in beta) sometime this month. But have begun work on a V2 port to different language and framework.
Any recommendations for a desktop/cli PDF optimization tool that will reduce the size of a pdf? I've tried few and the best one so far is the one that is included in the subscription version of Adobe Acrobat. But I only need it occasionally and is not worth paying $20/month sub.
Very helpful tip. Thank you! - Ran it through GS and I got 30% reduction in a 50mb pdf file. I think if I play around with some options - such as converting images to grayscale, I might be able to reduce it by another 10-20%.
I guess it depends on your document, but I'm surprised you only get that much compression. For scanned to pdf documents I often get orders of magnitude.
I'm not at my computer, but try messing with the `/printer` in the above command, there are other options, (possibly `/ebook`?) that control the compression ratio from memory.
I have a PDF problem that I thought was simple but has proven difficult to solve and there is no paid solution I’ve found…
I want to forward an email to an inbox, have the email body converted into a PDF, and then email that attachment to someone all automatically. I’ve tried Make, Zapier, pdf.co, pdftool, and a few other tools but have had no success. Has anyone solved this problem reliably?
If you are able to code or can ask someone, then you should be able to do it with some email api service (Nylas, AWS SES, etc) or headless client that gets the body of the email and convert it to pdf using wkhtmltopdf and then send it as attachment using the same service as before.
Using low/no code tools might be very hard/unlikely
If you want the pdf to look anything like the email, you will need to render it in a browser and capture a pdf. It’s not particularly hard if you know what you’re doing.
It seems quite doable but you'd need scripting skills to set it all up. Read the incoming queue, pass it to wkhtmltopdf then pipe the result to the mail command. For windows I believe I once used a java smtp server (apache james) that allowed you to set custom code as an incoming email handler. After that the conversion and email sending is trivial.
Probably depends on the purpose of the pdf and why it needs to be an attachment, but I'd just skip all the steps and print the email since that's more or less what pdf is for. Print it and re-attach or just print at the destination.
Google Apps Script can do all of this. Take the email body and put it into a Google doc, then export the doc as a pdf to drive and attach it from there to send.
I still couldn't find a tool for a difficult problem to solve. I have some magazines in PDF, with layouts in two columns, etc. I want them to be transformed into Markdown. I know, it should identify automatically the two columns, different layouts, etc.
I am not desiring something perfect - I can fix if ther are some errors, but so far nothing has come with a good result.
It overlays all the pages on top of each other, you the human draws rectangles around the stacked columns in the easy GUI, and then it processes them into pages.
This can be arbitrarily difficult to do, depending on the PDF. This is generally called PDF reflowing. Another approach is to use column-aware OCR software.
I am willing to pay for this, I only have around ~80 files, with 30 pages each (average). Is there a quick way to test this without wasting too much on the code part?
I just tried it with the API and it worked better than expected. Now I need to find to convert PDF to JPG with an API, and find the best prompt to ask GPT to only convert to markdown articles and not pages with columns, ads, etc. Thanks!
No problem! There are many Linux programs (some in this thread) to convert a pdf to images, and the prompt will hopefully not be too hard.
I started making app to read our board game cards out loud (with voice) for our horror board game nights (https://boardguru.net) and GPT-4 could read cards that I couldn't make out!
>Now I need to find to convert PDF to JPG with an API
You can do that very easily with a locally installed ImageMagick. ChatGPT can help with the commands needed, but should be just one to convert a PDF to a number of JPGs and a small shell script to run on all your files.
Can it add attachments to pdf files?. Until this year I did not even know that this was possible but a government agency asked me to add files as attachments to a pdf as their website only allowed uploading valid pdf files.
pdftk has an issue where it corrupts pdfs occasionally in modern windows server versions. Had a weird bug in a random helper service at work that we narrowed down to pdftk mangling documents sometimes. Never looked much into it since it was only a couple of hours to replace it with another tool, and haven't had issues since. I think all we used it for was merging and adding watermark text.
If you happen to be on macOS, the Preview app does an absurd number of things to PDFs, and it does it well. To be honest I'm always surprised it isn't highlighted more by Apple, it's a great tool that pretty much always just works. You can split files, join them, rotate, add signatures, drawings, annotations, redact sections, etc. The feature list is long, especially considering that by the name of the application you'd think it could just preview files, not edit them.
You can simply use poppler-utils on your on computer? It's a collection of commandline tools for PDF-manipulation. More information can be found here:
https://pypi.org/project/poppler-utils/
What I have mainly have been looking for in the free software ecosystem is a good tool to work with PDF tagging/structure/element attributes.
At work I really have only been able to do the work I need on random PDFs with Adobe Acrobat. It seems strange that this is the case as PDF is now an open standard.
Bluebeam PDF has an amazing Stapler tool. I can have a job that combines various pages of various PDF files and does a few other operations on them. When time for print comes, I run the job and output a PDF. For a kind of work that has to frequently put together various pages of various PDFs repeatedly to send as draft for review, this is a tool that makes life easy. https://support.bluebeam.com/articles/revu-21-revu-configure...
However, I have yet to find an equivalent tool from any other PDF application. And that includes this one.
Looks like the pdf-lib.js used by the project can do most of the advertised features right in the browser and there is even a wasm build of tesseract out there.
Have you considered making serverless/browser-only version?
Yes our v2 version we are working on is this! We plan to completely migrate functionality to be all client with a server side one for API requests as well
So the whole app is not made in chatgpt
It started like that 11 months ago though yes
I made the website and 7 pdf operations with chatgpt as a test to investigate chatgpts power and applications
Everything after that has been manual though and basically all the code has been changed by now
Smallpdf [1] probably deserves a mention here. Not OSS and not self-hosted, but I‘ve used it occasionally and it has always worked really well. When I was running an agency, we inherited their first office – very cool folks.
I'm not surprised... I mean just the the specification for PDF 1.7 is what ~1300 pages ?
And then there is 2.0, and all the extensions [1]
And multiply that with the number of implementations.
If the goal is to make something that "always" works, you probably need a big team to keep up with the moving field of various bugs and reimplementations
I did a small only-front app for that, it's open source if you want to check it (disclaimer: im not a front end dev, the ui is not good)
https://timothebarbe.github.io/pdfModer/
I'm looking for a tool other than adobe pdf reader that lets me upload an image file of my signature to sign a pdf. Most of the tools I found let me draw a signature and I can't draw my signature on a track pad or with the mouse
It's js only, nothing is sent to the server. It automatically makes the background of signatures transparent. The result is a raster pdf as if you printed, signed and scanned the document.
I use it on desktop, not sure if it works well on phones.
I’ll join some other commenters, to add my favorite difficult pdf problem that I haven’t found a ready to use (even paid) solution for: extract key value pairs from a filled form such as this medical claims form:
There are two levels of difficulty: the starting file could be an image (pdf or png or jpg), which is the most difficult scenario. The slightly easier one is where it’s a text-based pdf so no OCR is needed.
I threw this as an image file at google form parser but it did poorly, I.e missed quite a few fields.
I would do exactly what you have done here if I were the dev of the said app. But with the luxury of being an outsider, a user has expressed an inconvenience and it seems to make sense, then if I were to be the dev of the app here, wouldn't I go and create the ticket in whatever system with a link to this post instead of asking the user of the app to follow the red tape? I know there are places where this is not incentivised so this is a question for your org and not for you.
I see what you're saying and for simple features I agree
However Without the OP creating the ticket there can be no feedback look on the feature.
If i wanted it tested for their usecase, there input and confirmation on if its what they wanted and improvements for the workflow etc..
If I base the whole feature on this comment it could end up only doing half a job. Id rather have that communication loop open!
I tend to agree. As an open source dev myself, I avoid asking folks to create issues, as it puts a burden on the user. I’ve see some highly respected open source leads so this, and I’m
not faulting them, as I think they’re coming from a good place; it may be a difference of opinion on what’s best practice.
Not OP. My take is that if the requester can’t be bothered to create a GH issue, it’s likely that this isn’t really a problem for them. An annoyance possibly but has not risen to “pain” levels.
Still waiting for GPT4V but doubt it will do this. Yes I’ve tried Donut and other options but this is a very gnarly problem.
One option is to extract text blocks along with their coordinates (unstructured.io gives this, probably based on another pkg because it’s basically a container for many pigs). Then do the same with a blank template, and you then have an algorithmic problem of matching the filled values spatially with the key locations from the template.
As long as I can fill in forms and add a signature I'm sold. I loved being able to do this on MacOS but now that I'm on Linux I still haven't found any app that can do this.
I have not looked into this yet but can someone recommend an application for repairing pdfs? For example, I have PDFs where selecting text highlights a line above or below.
That doesn’t sound like the PDF is broken, just that it uses unusual font metrics or line displacements. Tools that could amend this are unlikely to exist.
More generally, the PDF format is too flexible to decide what is “broken” or really is as intended, in many cases. It’s l a bit like asking for a tool that repairs “broken” source code where it’s really just the business logic that is broken.
Does anybody know of a similar foss suite of pdf tools that runs as a static site only using local javascript? I would prefer that to something like this.
No paid backing, just running on donations at the moment.
I am tempted to try add a paid feature for AI integration or something or some high end office features as I have a fair few offices that use this software now.
But to be honest I would always want it free and it's just been a hobby
Dev here, totally could, we dismissed it at first as electron is quite bulky containing a whole chromium instance inside the exe. instead we kept it small as possible for the exe version
We have plans for a full UI version in V2. We are releasing V1 (SPDF is currently in beta) sometime this month. But have begun work on a V2 port to different language and framework.
As an alternative you could write some automation scripts to handle all the requirements for self hosted install. If you look at oogaboogas text-generation-webui [0] you can see what I mean. Although ease of install also leads to a larger user base, which can be a double edged sword for something as ubiquitous as a pdf app. It's much easier to get people to submit issues than to help solve them.
Why would I run a docker container, a webserver, start a browser, navigate webpages... just to do some operations on a pdf locally?
A few KiloBytes native program like PDFtk (https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/) does the job perfectly.
I don't understand what is the point of bloating softwares like this. Not even speaking of the very bad consequences for the planet.