Hacker News new | past | comments | ask | show | jobs | submit login
Edge displays “123456” in PDF but prints “114447” (microsoft.com)
142 points by noxin on May 4, 2017 | hide | past | favorite | 65 comments



Just in case you don't click through, the bug submitter refers to a color PDF (not a scan, so image compression artifacts are not an issue) that is similar in appearance to a periodic table. That is, it is not the sequence of `123456` that is mistranslated into `114447`, but a sequence of 6 table cells, each containing a single digit.

It's not just the numbers that are misprinted, but the text inside those cells too, which suggests that Edge's PDF engine is re-rendering the original PDF, rather than printing the original PDF as is, which I thought was the entire point of using PDF in the first place.

But maybe this is an edge case? In the sense that Microsoft assumes that given a PDF file, if a user wants to "Print to PDF", the user should just save the PDF file. "Print to PDF" is ostensibly used to convert HTML/DOC into PDF format.


As someone who works in PDFs constantly (due to work in local government), I would say the point of a PDF is to be able to reproduce the result given a file, not to assume the file cannot be changed... you can easily edit and save PDF files using Acrobat, for example.

It is common that when you "print to PDF" you take the output of the printer and serialize that to PDF. I use this feature often on my Mac (which I think many would claim has excellent support for dealing with PDF files) to build a PDF that is stripped of any interactive forms: so as to get an output which is only the PDF "as printed".


The built-in MacOS support for PDF is great. Any app that can print to a printer can print to a PDF file. And the Preview app, despite its name, can edit PDFs to add annotations (text, graphics), add or remove pages, etc. There's also a feature by which you sign your name on a piece of paper, hold it up to the camera, and it will capture it and insert it into a the document. It's convenient that these are built in and don't require paying for something bloated like Acrobat Pro.


I worked on Preview at Apple from 2008-2011 and created the signature capture feature. It's always cool to hear people talk about it!


Thank you! It was a tremendous addition to the product and I still benefit from it many times every week. It's such a great detail - very nicely done.


Man, this signature thing, PLUS ability to delete selected pages from big PDF is a killer on Preview.

I cry every day I'm on windows and need to do those simple things which Preview made so easy while being light.

Thank you for this!


I've demonstrated and gotten literally hundreds of users to fall in love with it. Thanks for that!


That's so amazing to hear! It pretty much flew under the company radar--most people didn't find out about it until the first reviews of the Lion beta came out.

I don't think it was very common at Apple for an individual engineer to conceive, implement, and ship a feature like this. There was a general sentiment on the team of "let's do something with signatures," but we knew that very few people had scanners. We thought about touchpad input, but decided against it at the time. (That came much later, in either 10.11 or 10.12)

I was thinking, "well, almost no one has a scanner, but practically everyone using this application has a camera in their Mac." I built the initial prototype in OpenCV and then ported it to Apple's vDSP/Accelerate frameworks.

My favorite detail, which doesn't seem to be present in 10.12, is that you could just click on a horizontal line in a PDF; since I recorded the signature's offset relative to the baseline superimposed on the camera image, it would place the signature exactly on the line, with descenders nicely descending.

I've since moved on from macOS/iOS development, but I had a really positive experience at Apple. Met so many amazingly talented people there.


Thanks! It's one of my favorite features in Preview.

I used to hate PDFs before getting a Mac, but the first-class support in Preview, with features like that (along with the ability to tear out, reorder and attach pages) made me change my mind.


It's such an awesome feature. I don't use macOS as much as I used to but I will always switch to my Mac when I get a doc to sign, which is a lot.

It must have saved many many millions of sheets of paper.


To be fair, on a Microsoft Surface you would just use your pen to sign directly on the screen which is much more natural. It doesn't make any sense to me that Steve Jobs was so anti-pen, while also pushing for intuitive interaction.


Perhaps if someone added electromagnetic traction so that the screen provided some push-back to the pen tip, it would seem "natural," but as things are now, my on-screen signature is as good as a chicken scratch.


you could get a very thin sheet of translucent paper, and put it on top of the screen. the pen should still work, and you'll get a more natural feel


Providing a pen is an excuse to leave controls that are too small to use with your fingers. So while Windows now has a Touch mode where many controls are larger, many others which are also essential to using the system are tiny and hard or impossible to use by touch. That is what Steve Jobs was trying (successfully) to avoid.


That's how it works on an iPad. Touch/pen interfaces are terrible on the PC/Laptop form factor though. At best you gimp your desktop metaphor so it's functional with touch, at worst you have an unusable touch interface.


I think he was anti the crappy pens and digitising systems available at the time, the requirement to have a pen to use the device at all and the poor interface designs they encouraged.


Thanks! I use that feature almost every day. We try to be a paperless home so being able to sign PDF forms and return them without printing is one of my single favourite features of OSX. We even bought our house using that feature for all parties to sign it around my MacBook.


Thanks. It's a great feature, and one that adds a lot to the overall Mac OS experience. It's too bad that most people have no idea that it exists, since no other OS has it. But for those who know, it's a huge time and paper saver. Excellent work.


I just found out about this feature a couple of days ago, after being a Mac user for 4 years. My first thought was it's incredibly neat - shame I didn't learn about it before.


Thank you so much!


thank you! i wish apple promoted it more. what a gem.


Doesn't Quartz use PostScript for rendering the MacOS desktop?


NeXT used Display PostScript, and I would create widgets by writing PostScript code.

Quartz is like QuickTime plus something like OpenGL shaders plus something like NeXTStep "display PDF". I have no idea if this encouraged PDF integration into the display model.

Rather, I would say that NeXT, and then Apple, had some great IP and cross-licensing of PostScript and PDF display tech, and so they could ship the OS distribution with PDF as the printing model.

That is, one reason Windows might today re-render PDF → XPS → PDF is that they had needed to create display tech like PDF anyway, and so they did, and this was after humans had been playing with HTML for a while... Silverlight was pretty.


> PDF → XPS → PDF

That's not what's happening unless you have a printer which directly supports PDF. Most likely what is happening on Windows is either:

    PDF → XPS → PCL
    PDF → XPS → PostScript
Depending on the printer. Which makes far more sense and is analogous to what OS X does:

    PDF → Quartz → PCL
    PDF → Quartz → PostScript
There's really no notable difference between these two approaches as both Quartz and XPS are based on the same printing model as PDF. Don't confuse Quartz with PDF just because it offers import/export to PDF - Quartz has its own internal imaging model.


NeXT Step did, but they switched to 'Display PDF' in OSX to save licensing costs as PDF is free to use.


Absolutely right. PDF fixes the problem Word documents have where different versions of Word tends to render the document ever so differently. Usually in a way that seems to mess up all those beautiful page breaks you meticulously planned. It does this by every element having absolute positioning.


Most people don't know you can open a PDF in Word 2016 and edit it, just like a .doc/.docx file...

Of course, reflowing the document is really painful with that source.


And is the scourge of every paper I want to read on my Kindle. Everyone publishes their PDFs and there seems to be no reliable way to reflow them.


Which is why ePub and other digital book formats exist. I'm glad there is a format that prioritizes WYSIWYG over reflow (though the part of the spec where the introduced scripting is a bit dodgy)


The "Reading Mode" on the Adobe's Acrobat app for Android does a pretty good job of reflowing most of the PDFs I've thrown at it.


Should be able to easily reflow text as long as it's using newline operators (T*, ', "). Might still need some basic heuristics for paragraph breaks. But much better than the alternative of attempting to correlate individual lines of text together based on positioning.


I use calibri for my E-Books, and it seems to do a pretty good job of reflowing PDFs(and exporting to mobi). YMMV though, especially if headers/footers are badly done.


Off-topic, but you seem like you might know: why does text copied from PDFs sometimes have messed-up spaces? It seems to guess where the spaces should go based on kerning, so with justified text, a widely-spaced line may come out with a space between each letter, while a narrowly-spaced one has no spaces at all.

(Also the thing where it inserts line breaks at the end of every print line is maddening)


That's often caused by the font specified in the PDF not being available on the platform where the PDF viewer is running, so a different font has been used instead.


Hmm. I may have been unclear--the PDF reads fine, but if I copy and paste some text into a text editor, I get the messed-up spaces. It seems as if the PDF doesn't encode text as text but just as a series of characters and locations, leaving spaces unrecorded, so when copy-pasting the reader has to guess from the distance between letters.


> I use this feature often on my Mac (which I think many would claim has excellent support for dealing with PDF files) to build a PDF that is stripped of any interactive forms: so as to get an output which is only the PDF "as printed".

Interesting; sort of like taking an old Fireworks .fw.png file, and then exporting it to PNG, to get rid of the Fireworks project data. Never thought of using "Print to PDF" this way!


> which suggests that Edge's PDF engine is re-rendering the original PDF, rather than printing the original PDF as is

That's to be expected. The bitmap which Edge has rendered to the screen is not what will be sent to the printer driver. Instead, rich vector graphics will be sent. On Windows, the native print format is XPS, so this is most likely a bug in how Edge converts PDF to XPS for printing.

For simpler use cases Windows' graphics APIs can be used to both render to bitmap and to XPS but when printing something as rich and sophisticated as PDF better results are achieved by directly targeting the native print format, such as PCL, PostScript, or XPS. I suspect that's what the Edge devs have done and why it's producing different results on screen and in print.


To make things even more interesting, the "original" PDF seems to have been generated by Ghostscript 8.15 and PScript5.dll 5.2, that is, it was also "printed to PDF" (from Microsoft Word, I presume).


I have also encountered something similar whilst attempting to print a ticket to a major amusement park in Europe using Edge. The page the tickets were on secured by a login mechanism, and attempting to print the tickets resulted a page with an error. I had to save the PDF to the computer and print from there to get the proper output. It definitely seems like Edge re-renders or even re-requests the PDF before printing.


Nice pun, it's an edge case...


|Maybe this is an edge case?

Slow clap.


> But maybe this is an edge case?

It is, Edge caused case


Definitely an Edge case.


The PDF "format" never fails to amuse me. Check out the talk "OMG WTF PDF" [0] from the 27. Chaos Communication Congress, it's eye opening.

0: https://media.ccc.de/v/27c3-4221-en-omg_wtf_pdf


You would probably like the James Mickens video "Life As A Developer: My Code Does Not Work Because I Am A Victim Of Complex Societal Factors That Are Beyond My Control"[0]. He starts talking about the Adobe PDF reader at 19:30.

[0] https://vimeo.com/180568023


I still love that the SHA 1 collision was able to change the color of an image in a PDF, due to the junk data present.


Thanks for that link.


This reminds me JBIG2 compression errors... [1]

[1] https://abbyy.technology/en:kb:tip:jbig2_compression_and_ocr


Hence the joke in the bug report.


Well.. 1+2+3+4+5+6 == 1+1+4+4+4+7 --- a bug with a sense for 'numerology'!


This reminds me of that photocopier that changed the numbers it was copying sometimes, through a dodgy image compression algorithm.


Yes, note the tongue-in-cheek reference reference to that problem in the article:

> (Possible workaround: Copy the document after printing using a Xerox copier.)



The PDF goes through different rendering paths for display vs print. It's GDI+ for display, and WPF with an XPS spool file for print. So my guess is whatever does PDF to XPS filtering/conversion is getting something wrong; but then it could be complicated by an addtitional bug in the print driver which is why the report says the bug depends on what printer is used for printing.


printing to pdf should be independent of printer.


I've had similar issues with chrome's PDF viewer where it displays one number, but if I copy paste, it shows a different number.


I wonder if it's related to when Xerox copiers changed numbers?

http://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_...?


x


you're missing the joke


and ur hiding it


Maybe we really should bring https://pdfviewer.io to Windows. Looks like the default app is somewhat crap :P


Why doesn't everyone use FoxitPDF in $current_year ?


What about SumatraPDF?


IMO, the far superior option.


Last time I tried it it had ads and was barely lighter than Adobe Reader




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: