Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Convert Your PDF File to Text with No Servers (greg.technology)
3 points by gregsadetsky on Oct 17, 2023 | hide | past | favorite
I recently discovered that - incredibly - PDFs that are just full of images are not OCR’d automatically on macOS

If you have an image, the system-level “Live Text” feature will allow you to select and copy text. That’s really great. But if you have a PDF - you’re stuck.

That’s part 1 of this project. Page 2 was seeing that most? all? web sites that offer pdf - ocr conversions require you to completely trust them as you need to upload your file to their server. That seems… not great.

Part 3 was finding tesseract-wasm, an amazing project that I combined with the just as amazing pdf.js. That’s it! Thanks everyone

Source code - https://github.com/gregsadetsky/pdf-to-ocr-but-no-servers



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: