I recently discovered that - incredibly - PDFs that are just full of images are not OCR’d automatically on macOS
If you have an image, the system-level “Live Text” feature will allow you to select and copy text. That’s really great. But if you have a PDF - you’re stuck.
That’s part 1 of this project. Page 2 was seeing that most? all? web sites that offer pdf - ocr conversions require you to completely trust them as you need to upload your file to their server. That seems… not great.
Part 3 was finding tesseract-wasm, an amazing project that I combined with the just as amazing pdf.js. That’s it! Thanks everyone
Source code - https://github.com/gregsadetsky/pdf-to-ocr-but-no-servers