

The Craziest F***ing Bug I've Ever Seen - wycats
http://yehudakatz.com/2010/01/02/the-craziest-fing-bug-ive-ever-seen/

======
wrs
I guess any bug that occurs below your accustomed abstraction level is
"crazy". If most of your work happens in an interpreted high-level language, a
quite mundane bug in the runtime (like this) is "crazy". Hardware bugs are
"crazy" to most software people at any level, but to hardware folks they're
just bugs.

~~~
Eliezer
_I guess any bug that occurs below your accustomed abstraction level is
"crazy"._

Good way of looking at it. So the ultimate crazy bug would be an error in the
laws of physics? Actually, one better than that would be discovering an error
in the laws of math.

~~~
tel
And the singular craziest bug is Incompleteness?

~~~
idlewords
not a bug, a feature

------
Eliezer
If this is the craziest bug you've ever seen, my guess is that you've spent
your whole career programming in languages that lack raw pointers.

~~~
wycats
Hehe. I guess you didn't get the reference
([http://wikiality.wikia.com/The_Craziest_Fucking_Thing_Ive_Ev...](http://wikiality.wikia.com/The_Craziest_Fucking_Thing_Ive_Ever_Heard))

------
jacquesm
If that's the craziest bug you've ever seen you've lived a sheltered life.

As long as you don't need to break out the logic analyzer or resort to sifting
through a stack of 4" of fanfold assembly printouts you're doing just fine :)

Bugs like that are always grounded in the assumption that lower layers are
reliable. Hint: they're not, they contain _tons_ of bugs, the conditions to
trigger them get more and more rare the longer a platform has been in
production. That means that the 'hardest' bugs are going to be the last and if
you find something in a platform that has been operational for a while you can
bet your life it's not going to be easy to track down.

Interrupt triggered bugs are usually really hard to find as are subtle timing
issues in high speed serial links. If it takes you less than a week to track
it down it may be the hardest bug you've ever seen but in the greater scheme
of things you're not in real trouble yet.

~~~
alain94040
_Interrupt triggered bugs are usually really hard to find_

Very true. Anything that is reproducible is fairly easy to debug. Something
that is hard to reproduce is even harder to figure out.

Here's a good scenario, on a good old Apple II: you'd call a subroutine. The
processor would put your address on the stack, to know where to return. So
when the processor returns, you can peek at the data just below the stack
pointer to figure out your own address. Works 99.999999% of the time. Except
if an interrupt occured exactly on the return instruction, erasing the old
stack data.

With that kind of frequency (interrupts would happen at about 60 Hz on an
Apple II), you'd freeze the computer after several minutes. Good luck figuring
it out.

------
sketerpot
My craziest bug was a program that would become flaky if you stood too close
to the computer. Eventually I discovered that something wasn't grounded
properly, and the program was being messed up by the electric fields of our
bodies. (Later, the CPU exploded.)

~~~
ErrantX
> Later, the CPU exploded.)

 _That's not a bug. That's a feature._

~~~
die_sekte
It's only a feature if the computer is a bomb. (Or if someone pressed the
self-destruct button.)

------
z8000
Neat bug but hardly what I would consider "crazy". The source code was
available for tracing (and that's how the root cause was made clear).

I don't consider a bug to be "crazy" unless the word "volatile" is involved :)

~~~
sketerpot
I consider segfaults in MPI programs to be crazy, if only because _damn it we
should not have to use C for this!_ I hope Fortress is ready for mainstream
use soon, because writing parallel programs in C is inherently nuts.

(This is not to put down weird volatile bugs, of course.)

------
jey
> Before calling the method_missing method itself, Ruby sets a thread-local
> variable called call_status that reflects whether or not the original call
> was a vcall or a normal call.

Yuck, MRI (Matz's Ruby Interpreter) uses a global variable for that? (Yes OK,
it's actually thread-local, but that's effectively the same thing.) Awful
style begets awful bugs. That's what you get for having such a messed up
interface.

~~~
wycats
Ha. I _knew_ someone was going to use this post as an excuse to wail on Ruby.
No other Ruby VM uses a global (or thread-local) for this problem; it's not
necessary to implement the functionality. While we were working through the
problem, Evan (who works on Rubinius, another Ruby VM) remarked how much
better the other implementations (like JRuby and Rubinius) handled the same
problem.

~~~
jey
I meant "MRI" and not "Ruby". It's clearly just an implementation detail and
not mandated by the language itself. I'll change the original post to clarify
this.

If we're going to be pedantic, it's a "Ruby Interpreter" or "Ruby
implementation", but not "Ruby VM". Ruby implementations can use a VM-based
design, but it's certainly not required.

~~~
wycats
Thanks! Ruby 1.9 and Rubinius are both "Ruby VMs", while Ruby 1.8 is an
interpreter, and IronRuby/JRuby leverage existing VMs. In this case, Evan, who
works on a Ruby VM, was looking at Ruby 1.9, another Ruby VM ;)

------
idlewords
It strikes me looking at the discussion here that "tell me about the craziest
bug you've ever fixed" may be a very useful interview question. You quickly
get an idea of someone's thought process, problem solving strategies, and
level of experience.

------
ErrantX
If you think that is crazy try this hardware related bug that STILL eludes me.

Netgear router -> Linksys range expander -> Netgear switch -> a set of
machines.

Problem is that in that set there is a Vista machine that randomly kills the
wireless network for _some_ reason. The Linksys connection light turns red and
both the router and range expander wireless disappears. Everything has to be
powercycled numerous times to get it back up. No other machines (Windows or
Linux) do this to the network...

Trying to track down a "problem" across multiple hardware/software vendors is
not fun :)

So Im gonna claim the craziest f __ __ing bug crown :)

------
teeja
My craziest bug was making text-only changes to an html file I'd edited for
months, saving the file, then seeing no changes in the browser.

Somehow I'd opened a second editing window for the file, exactly on top of the
first window - and the editor was saving with a slightly different name.

