
Show HN: I made a simple PDF text editor - shashanoid
https://simpdf.com/
======
jfk13
So it claims to

> Edit pdf like a word doc while preserving structure and format.

While I'm sure there are cases where this works pretty well and can be very
useful, it may be worth noting that -- just like every other tool in this
space -- there will be many PDFs where it simply can't work. It'll be hugely
dependent on exactly how the PDF-generating application/tool went about
things.

One simple example: suppose your PDF uses a specific font (not one of the
standards like Times or Helvetica), so the PDF-generating tool embedded the
font. (This is common.) Further, suppose the generating tool embedded a re-
encoded subset of the font, including only the glyphs that were actually
required. (This is also common.)

Now, suppose the edit you wish to make involves adding a character that was
not present in the original document -- let's say you want to change the date
from "May" to "June". But the original document contained no occurrences of
capital J (in this particular font/style), and so the capital J glyph is not
present in the embedded font. No "PDF text editor" can get around this; the
best you can hope for is a "J" in some fallback (such as Times) that may look
terrible alongside the intended custom font.

And as for edits that would require reflowing multiple lines of text, maybe
inserting a new paragraph in the middle of a page, etc.... not much chance of
this working out well.

Yes, a tool like this can (in many cases) make it possible to make _minor_
changes (perhaps fixing a typo or updating a word here and there). To suggest
that it can "edit pdf like a word doc" seems patently false to me.

~~~
saint-loup
I learned a lot from this article (discovered on HN I think):

What's so hard about PDF text extraction? [https://www.filingdb.com/pdf-text-
extraction](https://www.filingdb.com/pdf-text-extraction)

In a nutshell, PDF is fundamentally different than MS Word: it's a standard
for visual layout, without notions of paragraphs or even words.

As OP said, it doesn't mean the tool is useless. It could quite often come
handy for myself.

~~~
mehrdadn
One of the weirdest things I've seen is a PDF where the text is complete
gibberish if you copy paste, but perfect if you export it to HTML or Word in
Acrobat. Never figured out how or why that might happen.

~~~
Tsiklon
I read somewhere (likely here) that this oddness comes from the idea that PDF
is a way of structuring documents for print first, and presentation in a user
interface is secondary.

That the rendering of the document on screen is paramount as opposed to
ability to manipulate the text itself. "These characters should be displayed
at this position in the document precisely"

It would make sense that exporting the document as HTML or Word would make
this easier - as these document formats have different goals.

------
x32n23nr
Congratulations for the launch. Looks nice. For those that are cautious
uploading sensitive PDFs, you can always just open them with Inkscape, and
start editing pretty much anything in a document.

PS: I was once forwarded an application for someone who was supposed to
replace me, and I had to interview them. The expected salary was hidden by
placing a gray rectangle over it. I removed it using inkscape and saw the
expected salary was 30% higher than what I made.

Inkscape Link: [https://inkscape.org/](https://inkscape.org/)

~~~
lowwave
Inkscape to Illustrator is what GIMP is to Photoshop.

Glade someone brought it. For most common tasks is there really a point even
using Photoshop and Illustrator now days? Especially with the cloud direction
they are moving towards.

~~~
maaarghk
because GIMP sucks, maybe, and also because most people are required to
interoperate with the rest of the profession. i am very disappointed that
photoshop et al cannot be run under WINE :(

~~~
emayljames
I have done/still do complex composition and editing in both GIMP and
Photoshop, and there is really no missing features in GIMP. If you go into
using GIMP expecting it to be PShop, you will always be disappointed, but in
reality that is only a good thing.

------
throwawat573635
You should provide a sample pdf file so we don't have to hunt around for a
(small) pdf file just to see how it works.

------
jessmay
Wait wrapping a lib you didn’t write and that hasn’t been updated in 5 years
is now called “I made a text extractor”?

------
gnicholas
I tried using this with an invoice that I'd created using invoice-
generator.com, in the hopes that it would be an easier way to make new
invoices. When I tried to replace the To party's name, the text came back
partly bold and partly not. There was also a weird overlay on an email address
on the bottom that said something about email address protected.

Would love to have a tool like this that worked for making new invoices, among
other things!

~~~
punnerud
In EU it is illegal to edit an invoice outside a program that keep track of
all changes, bill number etc.

Not the same in US?

~~~
killerpopiller
at least for Germany: digital invoices shall be processed in a way that
manipulation can be ruled out (GoBD). Hence companies use special scanner and
document management systems which document the Revisionssicherheit.

~~~
corty
And that is the new, more relaxed situation. At first, digital invoices were
required to be signed with a qualified signature, i.e. spend a few hundred
quid a year on certificates by a few select CAs (only the usual german
"suspects" could qualify due to intentionally onerous requirements).

------
michaelmrose
A few notes. It doesn't seem to work with documents that have multiple
columns. Pushing text over just overwrites the other column. It doesn't seem
to reflow text where the source document obviously had margins possibly
because that information is looooong gone. Hitting enter to move text to the
next line didn't move other text it just again seemed to overwrite it.

------
shashanoid
Hi, I bootstrapped this simple website. Let me know what you think :)

~~~
mhasbini
I tried it with a bit complex pdf structure and it worked like charm +1. Would
love to learn more about the underlying techniques/tech.

~~~
hombre_fatal
[https://github.com/shashanoid/Simpdf/blob/1557bf838a8debeee1...](https://github.com/shashanoid/Simpdf/blob/1557bf838a8debeee1a6f21c12ee3e91c420b855/backend/run.py#L31)

Btw, arbitrary code execution vuln here, OP.

~~~
parhamn
Yeah. Switch array & args disable the shell. I hope they’re not running that
locally as the download script suggests. But then you still have a bunch of
other security issues. Shrug.

------
longtom
Unfortunate name:
[https://www.urbandictionary.com/define.php?term=Simp](https://www.urbandictionary.com/define.php?term=Simp)

~~~
userbinator
Fortunately, I don't think many people will be offended by the name...

...or not as much as The Gimp, at least.

------
smhmd
I tried it with a relatively complex pdf and it blew my mind.

------
ray991
I really wish the PDF layout was easier to parse. No matter which library you
use, you always run into edge cases which make text selection and extraction
an issue on certain files. I was recently extracting financial data from a
bank which provides only PDFs and every time they changed the format just a
little bit I had to change large parts of my code to extract the transactions
I wanted.

~~~
jfk13
PDF is designed to present a human-readable document, not to serve as a data
interchange format.

------
illender
@op line 105 on
[https://github.com/shashanoid/shashwatsingh.github.io/blob/m...](https://github.com/shashanoid/shashwatsingh.github.io/blob/master/index.html)
is missing an 'l' and will cause you to not get your email from your personal
web link

------
dheera
Now I just wish there were a version that could run locally. All the Linux PDF
viewers suck -- they can't even save a fill-in PDF form or insert a signature.

~~~
mrb
I spent an afternoon trying a bunch of them some years ago. I settled on the
freeware Master PDF Editor version 4 (version 5 inserts a watermark unless you
buy a license). [https://code-industry.net/masterpdfeditor/](https://code-
industry.net/masterpdfeditor/)

It is super lightweight and opens any complex PDF. Can insert signatures, can
edit anything in the PDF (without changing the font even if the font is
embedded in the PDF—as long as all the glyphs you are typing are present in
the font), etc. My only complaint is that it won't edit an encrypted PDF, but
a one-liner Ghostscript command can remove the encryption automatically:
[https://gist.github.com/compleatang/6046249](https://gist.github.com/compleatang/6046249)

------
yodaarjun
hackernews traffic crashed the host :( Cant wait to try tho!

------
kyawzazaw
I love it. How do I save it?

~~~
jnlar
^ this

~~~
dheera
there's a "Save and Download" option if you mouseover to the left.

~~~
zoid_
I found this too, didn't seem too obvious at first though.

------
top_kekeroni_m8
I'll be honest, I thought the website was called simp df.. lol

------
mamurphy
The website isn't loading anything for me right now.

~~~
salutonmundo
I am getting a Cloudflare 522.

------
schoolornot
A little glitchy with complex PDFs but WOW, amazing work!

------
Hoasi
Neat, would be useful for students' homework.

------
oxbridge
Works, it changes the font type after editing

