This is a clickbait, sensationalist headline. “Saving a PDF with Preview in Big Sur can corrupt OCR text added by a third-party program” is more accurate.
That's fair, and I honestly didn't enjoy posting the headline here, but as far as I know I have to use the original title? And the original title is from a personal blog where we talk about annoying things. We're not a professional tech blog, or a bug tracker, or… something other than our own little thing. I choose that headline not to be clickbait, but "sensationalist" is probably true because I'm personally really, really angry about this issue. I wanted to do my usual "scan all my documents for the month" routine 30 minutes before going to bed, and instead it turned into a two hour debugging session. And I can't even use my normal workflow now, possibly until March. I find it completely unacceptable that Apple would break Preview that way again. It's not even the first time. Just thinking about it now gets me going again. That's why the headline sounds like it sounds. I would have no problem at all if it was modified here, and as I said – your assessment is absolutely fair.
Kind of agree the current title is a bit misleading.
I've never used any editing features in Preview (I mean it's called "preview" so...) and reading the title I thought this meant it was mangling files even by opening them which would have been super scary.
As for non-Acrobat software mangling PDFs after editing... Well that's much less surprising. I've even had Acrobat mangle stuff in PDFs after editing...
Preview has similar features to Microsoft’s snipping tool. Highlighting, draw basic shapes, draw free hand with a couple colors, and add some annotations. I like it better than Windows most of the time.
Well, technically you at least only have a chance of noticing the error after opening a PDF (again). I suppose that's because after saving the old, correct data is still in memory, but I don't know exactly when the corruption happens – would surprise me if it wasn't upon saving, though.
In my very humble opinion, it's accurate: "Saving a PDF with Preview in Big Sur [Preview in Big Sur] can corrupt OCR text added by a third-party program [is irreversibly destroying PDFs]"
I understand the (fairly common, in these comments) viewpoint that this is the fault of the "third-party program", but since the PDF is readable up until Preview touches it...I find it hard to come around to the viewpoint the third-party program is relevant. Readable bytes -> Preview -> unreadable bytes is my mental model so far.
Edit: absolutely unacceptable this is downvoted to -4. I've observed for a couple months that participation in Apple-related threads, outside indignation that Apple was involved in the discussion at all, gets down to -5 before getting back to -1 a day or two later. No matter what tone is used, this happens, and it makes the problem even worse in the long run. Been here 10 years, always been a _slight_ problem, but over the last year, it's virtually impossible to participate without continuing to slowly destroy my 11 year old account. Not sure how much longer I can keep trying.
Preview isn’t breaking files by reading them, as I understand it, people are saving files with Preview and over-writing their ABBY compatible pdf. Just because the last four bytes of a file name is “.pdf” doesn’t mean anything that opens files with that suffix will work.
PDF is not a bitmap, it’s a script like HTML or JS.
People understand browser incompatibility but some how this is unconscionable.
No, the bytes are still readable by Preview, it's just the OCR meta that is apparently no longer copy-pastable.
No reason why you can't just run this through ABBYY FineReader again and get the exact same OCR you got the first time, so I think "irreversible" is definitely a stretch.
Maybe this can give you some insight. I downvoted your comment simply because it didn't add anything to the discussion and you made an assumption that has several faults in it.
For one, your "mental model" is off because you assume that the first part of "readable bytes" is accurate. Without actually seeing the PDF in question, you don't know if the "readable bytes" are actually corrupted and Preview is fixing them to make them readable. That would mean that Preview is actually correct in its behavior and the source document is what's flawed.
On the tail end of your mental model, then, is another assumption which is that this results in "unreadable bytes" but that's not accurate either. The PDF that results from a save in Preview may be accurate to the PDF specification and is perfectly readable as a PDF in any PDF application/reader. What's no longer readable is the extra content that was originally in the file that may not have been saved correctly, in-spec, or may have been corrupt to begin with.
A big hint as to what's going on here, now that I've had some time to review this, is that the "corruption" happens consistently - the letter "a" is always replaced by the same "corrupted" character, the letter "b" seems to be consistently replaced with the same character, etc. That points, in my opinion, to a lookup that's no longer correct. What side of that lookup is bad is hard to say without seeing the file in question.
The title is technically accurate, but it's misleading to non-mac users like myself. I assumed the author was using functionality called "Preview" only to view the documents, rather than save them.
There's a big difference between "read-only operation is mangling files" vs. "PDF writer is buggy".
Did they save the file using Preview? If so that’s on them, they chose to write a pdf using Preview and that comes with all of the pitfalls of pdf compatibility. Does plain old PostScript have this problem?
I'm not the original author. The usecase I have for Preview is to open it up, read it, highlight a few things and save the file (with the new highlight overlays). I wouldn't expect that to destroy my underlying OCR (which I also use a 3rd party app for)
If the behavior changes, that's not on me, that's on Preview.
I don't have any issue with this today on my Mac, but I'm glad I didn't upgrade to BigSur. I almost did.
Actually, I think it's not Preview bug. I use gramarly plugin in safari, and double copy-paste issue is happening there also. It's something more generally broken. I run 11.1
Probably most people who’ve done a bit of PDF work know there’s no guarantee of the same output from different (or even the same) editor. So I don’t think it’s Preview’s fault per since the problem is endemic to PDF. But I don’t think you can blame the user either. Really, PDFs are just these enormously useful complex things that are always breaking in unexpected ways and some people haven’t been bitten by its problems enough yet to cope properly.