I think it's appropriate linking directly to Kriesel's blog¹ or his talk, as that's about the scanner creating fake data and not about rce. Though technically it too is not an OCR bug as there's no ocr in JBIG2.
I wonder if OCR could be improved by adding a "language model" of sorts...
Like, sure, maybe it's hard to tell apart a "1", "i", or "l" purely visually, but if you knew it was supposed to be code, I'd suspect one could significantly improve the recognition accuracy if the system just worked in the probability of each confusable option given the preceding (and following) text.
This is awesome. Using computers for what they're best at: fax and figures.
I'm curious why this requires a reply number in the program, rather than relying on something like Caller ID and sending the reply back to the number that sent the fax.
It was probably just easier to implement. The build script[1] already has the source code, extracting the number from a comment is trivial, while retrieving out-of-band data like Caller ID from the fax server is likely more complicated. For a joke it's not compelling to do that, especially if you've already been fighting the fax server...[2]
I wonder if this is a perfect use case for an LLM. I bet that if you did submit he code to Claude/ChatGPT with a prompt to „fix any typos in the code that was read using OCR” it would have a pretty high rate of success.
So, as someone who has lived in regions with pretty severe internet censorship in the past and built circumvention software back in the day, I've always pondered the idea of whether one could build a fax-based thing like this for browsing the web. Kind of as like a "last resort" system.^
Could have a form that you fax in with, like a URL and session info (cookies and stuff), and then it faxes back the page, and you can circle stuff and fax the page back to interact and "click on" things.
Plus, since computers can ingest faxes, you wouldn't need to waste paper printing everything out, and could just do everything digitally. But you still had the option to use paper and a fax machine if you really need to.
^: Yes, I know faxes are unencrypted and phone lines can be tapped. But I've always found the idea intriguing. Plus having some emergency point-to-point communication to bootstrap things like key exchange could still be neat.
There was a time when web browsing was crazy slow and expensive, but there were e-mail services that were also crazy slow, but free.
There were mail to web gateways that you could e-mail a URL to, which would then reply with the contents of the web page. You'd then send another URL from that page, and get another reply, and so on. Free slow-motion web browsing.
I say "slow-motion" because this was back when getting a response to an e-mail took hours or days, not seconds. So you were lucky to get through three or four links in a day. But it was free, and we had other things to do than surf the web anyway.
If you had a ham radio connection and wanted to broadcast emergency bulletins to people, radio fax would be quite useful.
It’s push rather than pull like the web. Email works too, but fax has more utility in an emergency situation. Beats having to download adobe acrobat on every computer….
This is similar to the workflow for my CS101 class at college in the 70s.
I submitted my deck of cards to a person in the computer center at one of the times the PL/C compiler was scheduled to run (10 AM and 2 PM), I sat and waited, and then my output would be handed to me after it was compiled and run.
The fact that this can introduce OCR bugs into your C code is hilarious, and this is diabolical:
Source code is here https://github.com/lexbailey/compilerfax