

Lamest bug we ever encountered - exch
http://joostdevblog.blogspot.com/2011/12/lamest-bug-we-ever-encountered.html

======
akg
Reminds me of the time I had written a physical simulation engine back in grad
school and there was a "minus" sign error. Of course, the error was rare
enough that we didn't notice it until after the code was used in a real
production environment. Tracking down one minus sign in several hundred
thousands of lines is a pain. Not to mention the uneasy feeling you get after
you solve it, "How was everything ever working correctly before!? What else
did we overlook?"

~~~
Confusion
If I have to venture a guess, I guess you didn't have a comprehensive set of
tests at the function/method level of the code? Having that would probably
have caught the bug, because you would have written a test for correctly
executing the code in that branch.

~~~
akg
You're right. But it was after that pain-staking experience that I became
fully engrossed in using unittests for all non-trivial functionality. Live and
learn.

------
AndyKelley
I'm not completely satisfied by the explanation. I still have that uneasy
feeling that you get when you solve a bug, but an unsolved mystery remains.
"Also, I still don't know why not all consoles connected to that PC froze."

~~~
radarsat1
He didn't mention how the logging was done but if it was over a TCP connection
then the send() call probably blocked until it timed out since the sleeping
computer didn't close the socket nicely, then it had to re-establish the
connection. Although reliability is nice, if I were writing a remote logger
for a something like a game, I think I'd use UDP.

~~~
AndyKelley
Are you trying to explain how it's possible for some of the consoles to freeze
but others not while talking to the same sleeping computer? If so, I did not
understand your explanation.

~~~
alexgartrell
I believe socket writes don't block until you've filled the internal socket
buffer, so it's likely that the unaffected machines simply hadn't done this
yet.

~~~
AndyKelley
ah, there's the missing piece of information. Now I got it, thanks.

------
botker
I'm reminded of this story of the folks who worked on LEO hunting down a
similarly difficult-to-find bug that was eventually found to be caused by an
unrelated external machine: the manager's elevator.
[https://www.youtube.com/watch?v=Lrn24SdW64I&t=2m50s](https://www.youtube.com/watch?v=Lrn24SdW64I&t=2m50s)

------
einhverfr
I once spent an afternoon tracking down a "bug" as to why sales tax wasn't
being calculated on LedgerSMB only to find out I had set the tax rate to 0 in
the tax interface.... Ok, it was working as intended. I felt pretty sheepish
too.

~~~
decadentcactus
The worst bugs are when things work as intended, but you still think it's a
bug, such as your example.

~~~
Natsu
It's worse when your users find these and are all mad because the computer did
exactly what they told them to.

~~~
AndyKelley
Nah, then it's a bug in your user interface.

~~~
einhverfr
While I am sympathetic to this argument, I would say that is not always the
case. Some configuration issues are usually required and when something is set
up for a specific case, and it behaves for that case, and the user simply
forgot that this is what they did, then it's a bug only in the storage
retrival routines of the user's own memory.

------
TwoBit
They could have solved that bug with one developer in ten minutes by just
telling the PS3 to generate a core dump and running addr2line.exe on the core
dump report's callstacks.

And the report places the blame on the server instead of their code. Clearly
it's their code's fault for doing blocking sockets calls in a main thread.

------
zitterbewegung
This looks like an interesting bug. I wonder if there are more bugs like this
from the website view such as analytic tools giving you false or misleading
information? Or, even monitoring or performance tools?

------
simoncpu
The lamest bug you will ever encounter deletes your whole /usr.

~~~
manojlds
How is that lame?

~~~
narcissus
I think he's talking about this
[https://github.com/MrMEEE/bumblebee/commit/a047be85247755cdb...](https://github.com/MrMEEE/bumblebee/commit/a047be85247755cdbe0acce6f1dafc8beb84f2ac#diff-1)
, where the deletion of /usr was not on purpose... the bug was a space in the
middle of a file path in the install script.

