Hacker News new | comments | show | ask | jobs | submit login

Minor correction: The article says that the JBIG2 patch size might be the size of the scanned text. JBIG2 actually has the capability to detect regions of text and compress them using a specialized technique that operates on individual symbols.

I suspect Xerox is using this option and their implementation is getting confused (perhaps by the low resolution). Unless I'm greatly mistaken, the patch size for normal compression shouldn't figure here.

I was confused by that as well. From what I understood how JBIG2 worked, those symbols don't even have to have the same size everywhere (as would be quite common with proportional fonts anyway). So there is no "patch size" per se; just the low resolution confusing the classifier.

I doubt the patch size is even configurable, as identified patterns can be scaled accordingly. However the author is not to blame, because JBIG2 is poorly documented and the implementation of the compressor is not specified in the standard.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact