Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's the lossy jbig2 compression in Xerox copiers: http://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_...

And yes, I think this is a relevant comparison. As the entropy model becomes more sophisticated, errors are more likely to be plausible texts with different meaning, and less likely to be degraded in ways that human processing can intuitively detect and compensate for.



> t's the lossy jbig2 compression in Xerox copiers: http://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_....

My understanding of this fault was that it was a bug in their implementation of JBIG2, not the actual compression? Linked article seems to support this.


I think it was just overly aggressive settings of compression parameters. I don't see any evidence that the jbig2 compressor was implemented incorrectly. Source: [1]

[1]: https://www.xerox.com/assets/pdf/ScanningQAincludingAppendix...


Right. Jbig2 supports lossless compression. I'm not very familiar with the bug, but it could have been a setting somewhere in the scanner/copier that it was changed to lossy compression instead. Or they had lossy compression on by default or misconfigured some other way (probably a bad idea for text documents).


The bad thing was that it used lossy compression when copying. That was the problem.


No. The bug was when using the "Scan to PDF" function. It happened on all quality settings. Copying (scanning+printing in one step, no PDF) was not effected.


I remember differently, but I don't want to pull up the source right now.

I did check some of the sources, but was not able to find the one I remember which had statistics on it.

The xerox FAQ to it does lead me to consider that I might be confusing this with some other incident though, as they claim that Scanning is the only thing that is affected.


https://media.ccc.de/v/31c3_-_6558_-_de_-_saal_g_-_201412282...

I'd believe him more then any other source.


He did his presentation in english at FrOSCon 2015, can be seen here: https://www.youtube.com/watch?time_continue=95&v=c0O6UXrOZJo


No compression system in the world forces you to share parts of the image that shouldn't be shared. So that's true in a vacuous sense.

But the nature of the algorithm means that you have this danger by default. So it's fair to put some blame there.


This is a big rabbit hole of issues I'd never even considered before. Should we be striving to hide our mistakes by making our best guess, or make a guess, that if wrong, is easy to detect?


The algorithm detected similar patterns and replaced these with references. This lead to characters being changed into similar looking characters that also appeared on the page.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: