Hacker News new | past | comments | ask | show | jobs | submit login
Print this file, your printer will jam (2008) (nedbatchelder.com)
248 points by JoshTriplett on Aug 14, 2014 | hide | past | web | favorite | 66 comments

Reminded me of “The case of the 500-mile email”: http://www.ibiblio.org/harris/500milemail.html

Or the "Car allergic to vanilla ice cream" tale.


Or the "OpenOffice can't print on Tuesdays" bug https://bugs.launchpad.net/ubuntu/+source/cupsys/+bug/255161...

That one reminds me of the system I worked on years ago that wouldn't accept credit cards that expired in August or September (of any year).

I bet it was a Javascript system that tried to interpret month numbers of 08 and 09 as octal because of the leading zero. Am I close?

That's clearly a tale rather than a true event. The guy would have noticed during other extremely short trips that his car wouldn't start afterwards.

I was impressed that the users were able to troubleshoot the fault that specifically in the first place.

Also, that it's pretty clear that they were only able to do so because they had limited technical knowledge of email: to someone who runs a mail server, the idea of email not getting further than 500 miles is ridiculous and absurd. To someone not used to it, not so much...

I'm sure there's a cute name for this specific variety of irony, but I don't know what it is.

I think it had a little bit to do with the users being statistics professors, too. That might just be the ideal background for writing bug reports.

to someone who runs a mail server, the idea of email not getting further than 500 miles is ridiculous and absurd

Not really, unless that someone doesn't know about the structure of networks. My first two hypotheses after reading the problem description were routing anomalies and timeouts.

That other story about the car and vanilla ice cream, on the other hand, is a great example of unintended correlation != causation.

Well, it did involve people with the job title "geostatistician".

Fun article, but I don't completely buy the conclusion. The propagation speed of cat-5, coax and fiber optic cable all fall in the neighborhood of ~60-70% the speed of light in vacuum. Also it's very unlikeliness to have anything resembling a straight line path from the mail server to the destination. If the connections were aborted in a bit over 3ms you would have much shorter range than 500 miles, I'd guess something like 1/4 to 1/2 that distance between the lower propagation speed and indirect path.

The timeout was unlikely to be 3ms. A 100hz clock would give you a timeout of 10ms. Also, the timeout needs to cover the distance in both directions. Half of 10ms is 5ms, which is just about enough for a 60% of c transmission to go 500 miles.

Read the FAQ, linked at the top.

That was a great post and well worth the read.

One of my first jobs was QA for some printer-scanner-copier machines at HP. My favorite bug that I found was that I had an 8x10 photo of my girlfriend at the time, scanning which bricked the device. We were never able to find another way to reproduce so that photo ended up in the regression suite's archive of assets (maybe it's still there).

The relationship didn't last, so the morale of the story ended up being always trust software defects when it comes to relationships. This turned out to be especially true later when a pretty egregious Myspace bug introduced me to my wife :)

The notion of a printer/scanner group, or producer, or team doing QA is one that makes me want to laugh, then cry.

Printers are such unimaginable sources of pain, such a crippling tax on everyone forced to interact with them in any way, that it's hard to imagine anyone working to produce them in any standard, structured way.

"Printers are such unimaginable sources of pain"

Well either you've a severely lacking imagination or have had a really comfortable life. I hope for the latter :)

Kidding though, yeah printers are frustrating and cobbling together a printer-fax-copier-scanner into one device is even more terrible. HP was really serious about process though! The test group was literally bigger (more people) than the dev team.. by probably a factor of almost 2.

HP was really big on Model Based Testing [1] and we read books and maybe even had a class on it when joining. The QA team was all in all pretty high functioning, if memory serves.. but the products would still get shipped with four figures of open bug reports (I am not kidding or exaggerating).

The sort of bugs at the start of the cycle were downright terrifying, too. If your printer has never caught fire, or vomited all of it's ink on your desk in the middle of the night... thank your friendly HP QA employee!

Mostly in my era the problems with the products were that they were done on a pretty tight schedule (6-12 months) for what tended to amount to more complex/new hardware than folks would initially expect. The testing cycle also started pretty late because the initial "get it barely working" cycle ate up a couple-few months, and the non-firmware software (drivers and crapware utilities) got very little attention by the a-players on the team.

[1] http://en.wikipedia.org/wiki/Model-based_testing

Printers were sent from hell to make us miserable: http://theoatmeal.com/comics/printers

Do you still have the picture? I would love to test it out on one of the old scanners I have. Also, maybe HN could identify the reason for the bug.

Sorry, I don't. At the time I wasn't super technical (started writing automated tests soon after and then landed in development). It was some form of buffer overrun.

unrelated to the topic at hand, is your username a reference to Hedy Lamarr?

No. Just a totally ridiculous random handle I imagined up when I needed a throwaway account years ago that accidentally stuck. It's pretty embarrassing when retail stores ask me for my email to send me a receipt.

Reminded me of a problem I had been debugging ages ago. The initial complaint was that the particular file transfer through a firewall was hanging. Further tests: it hangs at the same place.

More testing: not just this file, some other file transfers from unrelated places as well - and always at the same places within the file (but different across different files).

Take the tcpdump on both sides - always the same segment of data within each of the file transfers does not make it through the PIX. Take the offending segment and convert it into a small file of its own - this file is impossible to download through, gets dropped.

Needless to say, this all was observable only on that particular setup - not in the lab.

Finally I noticed that the CRC error counter on the inbound interface increments by one every time I try to push through the offending small file.

Replacing the Ethernet cable connecting that interface had solved the problem with all of the "hanging" transfers.

We did not do any further research into the root cause (the user did not want to put back the previous cable), but the working theory was that the initial cable was made bad, but not bad enough to not work at all - and the fault only showed up on particular sequences of data.

Given that neither 10BaseT nor 100BaseT used scrambling, this seemed plausible enough of a theory, but was quite fascinating nonetheless.

Three fun connectivity issues I've diagnosed:

1) Problem report was: "I can login to remote host fine, but my interactive session hangs when I type 'ls -l'." Some diagnosis indicated the hanging only occurred when packets were above a certain size. This only happened when a command was run that had a large amount of streaming output. Turned out our provider's frame relay network had become misconfigured. Had to reduce the MTU at both ends as a workaround until I was able to convince our provider that something was wrong with their frame relay.

2) My home Linux box unable to talk to a Verizon pager gateway, but my Mac was fine. Turned out Verizon's firewall was dropping packets that had the ECN bit set. Had to disable ECN on the Linux box.

3) A Solaris host that was sporadically reachable over the LAN. It could always connect to other hosts, but orher hosts could only sometimes reach it. Turns out you can disable ARP replies on Solaris by deleting the "publish" entry for the box's own MAC addr from its ARP table. And someone or something had done that by accident. So the box was only reachable on the LAN till its MAC timed out of the other hosts' ARP caches. However, if the host in question first initiated a connnection to another host, the destination host would then learn its MAC again.

Troubleshooting the issue similar to (1) was one of my interview questions for a while - though with HTTP. It's a treasure trove of discussion at very different levels.

Another "fun" (and hard to debug for the first time) is the problems triggered by duplex mismatch - though the asymmetric ~1/10 difference in performance quickly becomes a signature once you met it once or twice. Also good anchor where you can go into many types of discussions.

The very similar to (3) I've seen manifested in IPv6 - due to Neighbor Discovery working over multicast, and potentially asymmetric ways of multicast propagating within the L2 segment (different L2 switches on two sides, etc.)

But very interesting to learn this asymmetricity in Solaris stack, thanks!

I had the same issue, file transfer from Windows client to a file server. It always stopped if the file was larger than a few MBs. It turned out the ethernet patch cable was damaged (CRC error counting), replacing it solved the issue.

Another spooky issue was Skype on my iPad stopped working every 10 days. It turned out that I accidently set the WiFi switch to a static IPv4 address on a LAN with DHCP server (and I powered off the WiFi switch every night). So every few days another device on the network got the dynamic IP address (from DHCP) that I statically set also for the WiFi switch, ups.

Did you happen to keep that bad cable ? Would be terribly interesting to see which kinds of bitpatterns would trigger it.

Sadly no. I told my co-worker about the cable and he told me that he had the same issue with a patch-cable too. He had damaged the cable with his office chair rolls and replaced the cable. So probably he stored it away, and I used his old damaged cable for months until I figured out my upload issue.


Seeing as the error happened on the same place each time, it might sound like you were getting errors that didn't _always_ get caught by the checksum algorithm (i.e the error "happened" to match the checksum) (maybe).

Sounds like another good reason to do separate checksumming of files after download (read: sha1/md5).

Oh yes, crypto checksum is always useful. Another bug (also eons ago) was where something was trying to patch the TCP options, with a very lax assumption about the packet - as a result with some way fragmentation it would think it was patching the option, whereas it was smashing the payload... Of course, after patching the "TCP option" it was adjusting the TCP checksum, or, rather, the place where it thought the TCP checksum was - which in that case was again within the payload.

Needless to say the whole operation made the corruption checksum-neutral from the viewpoint of the "real" checksum - so it was not caught by the TCP checksum, as a result the files were sometimes silently corrupted.

Took a lot of work to catch and debug, the symptom was that Gentoo's packages appeared to be corrupt and failed the SHA1 checksum check.

So, indeed, cryptographic checksumming of files is a good thing.

Crazy. There's all sorts of strange things that can go wrong over the wire, especially if you're operating at the wrong level of abstraction.


Good conclusion in that article! Sometimes (actually quite often in practice) neither of the parties does anything blatantly wrong, but just bends the assumptions in slightly different directions, together becoming a trigger for a failure.

"Reminds me of the time when ..." printing a file with some patched up negative page count actually upped your printing quota in university.

I found something similar at my university where you could underflow the unsigned short they used to track your quota.

The system would of course refuse any single job that took you over quota but, if you wrote a script that pounded a few jobs slightly below quota right after each other, two or more would get through, underflow and you'd suddenly have 65k pages left.

I know when you left the school you could get reimbursed for any remaining pages left on your ID but I wasn't crazy enough to try that. 65k * $0.05 = a chat with the police, I'd guess.

Some similar bugs, the later one looks like an urban legend:

"Printer won't print on Tuesdays" :http://mdzlog.alcor.net/2009/08/15/bohrbugs-openoffice-org-w...

"A member of the famous Black Team manages to create a sequence of operations that topples over the tape drive": http://www.penzba.co.uk/GreybeardStories/TheBlackTeam.html

The Black Team link inside your link is broken, http://www.t3y.com/tangledwebs/07/tw0706.html is blocking all robots so no Web Archive copy either. I think this is a copy of it: http://www.t3.org/tangledwebs/07/tw0706.html

    > The Black Team link inside
    > your link is broken ...
Fixed - thank you.

I can't find it now, but one of my favourites was the 'virus' that caused Word to keep typing phrases about Mozart, Bach, and nuclear weapons in Iran. Turned out to be a voice activation program running in the background, next to a radio playing Classic FM.

> "A member of the famous Black Team manages to create a sequence of operations that topples over the tape drive": http://www.penzba.co.uk/GreybeardStories/TheBlackTeam.html

The "in front of the press" bit is almost certainly an urban legend, but the concept did happen: http://www.catb.org/jargon/html/W/walking-drives.html

In college I shared an apartment with some friends. Every few weeks, when somebody was watching a movie it would spontaneously rewind part-way into the movie. It happened infrequently enough that we didn't really investigate until one night when a bunch of people were over. The movie would rewind whenever the lights in the living room were off, and somebody pulled the cord on the ceiling fan to change its speed. It was 100% reproducible, but baffling.

After much investigation, we traced the problem to a Rock Band drum set foot pedal. It has an AC adapter plugged into the wall, and also acts as pressing L on the PS3 controller when it's triggered. Somehow, pulling the cord on the ceiling fan made the foot pedal act like it got kicked, which would rewind the movie playing on the PS3. Unplugging the foot pedal or drum set solved the problem.

Wow I am old. The entire time I was picturing a VCR.

It's caused by the "rewind" word. I never owned a VCR but still thought about one :). The software started to use other terms for the action, like "jump backwards" (VLC media player). Probably because the actual rewind action (show the movie backwards very fast) is not easy to do in compressed video (to show the last frame before a keyframe you have to seek the previous keyframe and recompute all the frames until the desired one).

It's too bad that current media formats don't allow you to scrub (at least, the ones I usually come across - I have no idea what makes a compression format "scrubbable"). IIRC, this was one of the selling points of Quicktime.

Only key frames and no frames that are in any way predicted by what came before it. That's actually the major point that differentiates formats intended for consumption and those intended for editing. If you usually only watch something from start to end, you can vastly increase compression ratio by re-using parts of earlier frames that changed very little. However, this means that to get to a particular frame you have to find the last keyframe before that and then decode from there.

Can't you apply B/P-frames in reverse, in theory? Or actually only show the I-frames.

edited for typos

You could, in theory. Most video streams often have an I-frame every few seconds which is perfect for ffwd and rwd.

Just because a bug seems impossible doesn't mean it is.

Why would anyone who has used a printer more than once think it would be impossible to reliably jam a printer...

Because it seems strange that anything involving a printer would be reliable.

My favourite impossible bug was during a autonomous robotics course in which we had to design a small robot which would collect balls and move them within a target area using quite simple sensors.

One of these sensors was a simple conductive sensor which sensed metallic tape on the floor of the arena.

One of the teams had a freezing issue that they couldn't get to the bottom of, until they realised that it froze when their back roller (a sphere in a cage so it can rotate in 2 directions) went over the metallic tape.

The theory goes that the moving lego parts generated static electricity which was then conducted somehow down to the metallic tape, which caused some fault in the single board PC we used.

I wouldn't have believed it unless I'd seen it. Replacing the metal roller with a lego piece cured the issue.

This reminds me of an article I read some years ago about the powers and pitfalls of genetic algorithms. I don't remember the details, and I'd love to find a link to it again. The main thing was that these guys were using GAs to generate circuits to perform a simple signal-processing function. They worked fine, but they were blindingly incomprehensible, the physical equivalent of undocumented spaghetti code.

One of them memorably had a wire or a resistor or something that wasn't connected to anything else in the circuit. So they removed it.

The circuit stopped working.

They put it back in.

It started working again.

The best anyone could figure was that that bit of metal was interacting electromagnetically with the rest of the circuit in some immeasurably small way that the entire thing depended upon.

It was a tone discriminator and the experiment was run by Dr. Adrian Thompson.


Shorter article: http://www.damninteresting.com/on-the-origin-of-circuits/

Related: "more magic" switch: http://www.catb.org/jargon/html/magic-story.html

Sounds like the neural net tank story: https://neil.fraser.name/writing/tank/

Related random printer bug: https://news.ycombinator.com/item?id=8171956

Nice, I remember the LPS-20 :-) My favorite though is the person who figured out you could pwn a printer by printing the right document on it [1]. For network attached printers you could print this guy's resume and he would then have a node under his control on your network. Scary and cool all at the same time.

[1] http://events.ccc.de/congress/2011/Fahrplan/events/4780.en.h...

> Just because a bug seems impossible doesn't mean it is.

How true.

TL;DR: The render time of that file caused the drum to have to stop just briefly enough so that it couldn't start properly again.

Except this post wasn't "too long" at all.

I didn't mean to imply it was, just trying to be helpful for those who are curious but don't have time.

If they don't have time they probably shouldn't be spending it on HN, ;).

For short stories like this I think http://tldr.io/ is a better place for TL;DR's than the comments.

I can't see an attached file. I would like to try it out. Did I miss something?

It was 20 years ago: the bug is long since patched and the printer put out to pasture.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact