

Worst Bug Ever - swanson
http://swanson.github.com/blog/2013/01/20/worst-bug-ever.html

======
fosap
IMO Matlab is the worst programming language widely used in production.

This perfectly fits my expectations of Matlab:

\- i is a mutable, but should be constant

\- It has a horrible, non obvious name because mathematicians are used to it.
(And they are not used to var names longer than two chars)

\- It is in the global scope.

\- It's a feature that has a special syntax (sort of).

/edit Apologies for php programmers, maybe php is worse. But I'm not sure.

~~~
nemetroid
> \- i is a mutable, but should be constant

It's not really mutable, because it isn't a variable. However, if there is a
variable by that name, Matlab will use it instead of the complex constant.

> \- It a feature that has a special syntax (sort of).

The problem is rather _a lack_ of special syntax (or too much leniency).
Python also has built-in complex numbers:

    
    
        >>> 1 + 1j
        (1+1j)

but it doesn't allow you to omit the constant before the j:

    
    
        >>> 1 + j
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
        NameError: name 'j' is not defined
    

You can do it the same way in Matlab to avoid this problem (1 + 1i is always
OK since implicit multiplication is not allowed), so the problem is that
Matlab lets you use i syntactically like a variable when it in fact isn't.

~~~
andreasvc
It will only give that NameError if 'j' is in fact undefined; 'j' can be used
as a variable which would lead to a similar bug in Python (and why is it 'j'
instead of 'i' ... they're both common one-letter variables).

~~~
DoubleMalt
Electrical engineers use j instead of i to distinguish it from current (that
is commonly referenced by i).

~~~
shardling
Whereas physicists use capital I for current. (And J for the current density.)

~~~
fosap
I have seen j for current density only so far. But physics are convinced you
can tell the difference by context (which is usually true, but probably
impossible for a compiler, even if physics is "typed" with units). For
handwritten text at least my professors managed to come up with at least two
different ways to write each letter to avoid confusion. My (not serious)
attempts to use Chinese characters haven't been very successful.

------
danialtz
I was using a very well-known popular computation software direct competitor
to ANSYS Multiphysics. I was using it since early version 3 days, through 4.x
over a period of 3 years in my phd in the field of fluid-structure interaction
(e.g. fish in flow). The computations usually took over several days on many
cpus for 1 second of real-life simulation, making testing quite painful.
Hence, many studies do not involve testing like in software field.

I was calculating a parameter enhancement where my calculations were showing a
great 180% enhancement; this result would make it into very high ranking (15+
IF) paper right away. By then, several people had written numerous papers on
this matter using this multiphysics software in well-known journals. But my
gut feeling didn't let me. Eventhough my professors were pressing me to
publish, I took 6 painful months (toward the end of my phd) to rigorously test
the software itself, having nothing to do with my work, since it was closed-
source.

After 6 months, I found out that a velocity term used in their stabilization
algorithm does not use relative velocity, but absolute velocity, leading to
enormous amount of fake diffusion into the model. I had to travel to another
city in europe to convince them that there is such bug.

Every single work in my field that used this software till then had been
severely invalid. And what the developers did: they fixed the bug in the next
sub-release without putting it in the changelog like nothing happened.

~~~
KMag
I don't suppose you were able to publish a peer-reviewed journal article
calling previous results into question, were you?

------
jballanc
I know the feeling, but I think I have a bug that beats even this one...

Once, I was working on the "alias" keyword in MacRuby, and getting it to work
with methods defined in Obj-C. Since MacRuby objects are just extensions on
Obj-C objects, some methods are defined in Obj-C and require special casing in
order to alias them to a different identifier. For example, if you want to
alias "Array#length" to a different name, you'd actually be aliasing
"-[NSArray length]".

So I do everything I should need to do to make it work: get a handle to the
object, the object's class, the method defined for the class, a selector for
the new name...but no matter what I tried, every time I aliased the method,
and attempted to call the method by the new name, I would always get the same
error: "Method not defined".

Finally, I decided to dive deep with gdb and find out exactly what was going
on. To my shock and amazement, I came to realize that I _was_ properly
aliasing the method but, due to the nature of NSArray's implementation as a
class cluster (and the fact that Obj-C doesn't have proper virtual methods
like C++), the implementation of "-[NSArray length]" was literally:

    
    
        - (NSInteger)length
        {
          [NSException raise:@"Method not defined." format:@"%"];
        }

~~~
eridius
Are you saying that when a name is aliased in MacRuby, the alias becomes a
copy of the IMP from that particular class, such that subclass overrides of
that method are never invoked? Because that sounds like a bad idea.

~~~
jballanc
Aliasing in MacRuby (or Obj-C for that matter) involves adding a new entry to
a class's method table with the new name but a pointer to the old
implementation. Method lookup from subclasses to superclasses proceeds the
same as it would for any method.

~~~
eridius
Yes, but if you message the subclass with the alias, only the superclass's
method gets invoked, because the alias apparently contains the IMP of the
superclass's method. Or it must, if what you said before is to be believed.

This is pretty nasty, because it breaks the idea of message sends and subclass
overrides of methods. There's two ways you could solve it:

1\. Set the IMP to a static function that looks up the method it should
forward to and re-send the message there. This is hard to do in a manner that
supports all arguments, and you'd need at least 2, possibly 3 versions anyway
(one for regular, one for strret, and on some architectures, one for fpret).
It would probably need to be written in assembly in order to forward the
arguments.

2\. More recently (in iOS 4.3, and whatever the corresponding OS X release
was), the runtime got a nice little function called
imp_implementationWithBlock(). This takes a block and returns an IMP. Bill
Bumgarner has a nice blog post about it
([http://www.friday.com/bbum/2011/03/17/ios-4-3-imp_implementa...](http://www.friday.com/bbum/2011/03/17/ios-4-3-imp_implementationwithblock/)).
You could use this function on every alias in order to invoke the correct
method, passing along the correct arguments.

~~~
jballanc
That may be nasty, but that's how the semantics of method aliasing in Ruby
work. Try this Ruby snippet to see what I mean:

    
    
        class Foo
          def say; puts "hello"; end
          alias shout say
        end
        
        class Bar < Foo
          def say; puts "howdy"; end
        end
        
        Foo.new.say #=> "hello"
        Bar.new.say #=> "howdy"
        Foo.new.shout #=> "hello"
        Bar.new.shout #=> "hello"

~~~
eridius
Oh ugh. When is an alias not an alias? When it's Ruby, I guess.

------
jasonkester
I had a similarly fun time back in the 90s debugging a VBA script a co-worker
had written for an Excel spreadsheet. She was looping through a bunch of time
values...

... so she named her variable "time"

... which, when modified, would (naturally) set the system time of the
computer.

I'm not sure whether they've fixed that one yet.

~~~
ja27
At least once a quarter I'd have a student working on some programming
assignment on one of our unix-ish systems where they named the executable
"test" and couldn't figure out why it wasn't doing anything they expected.

I also got to the point with FORTRAN (yes, I'm old) students where I'd cut
them off and tell them to count the columns and make sure each line started
with 6 blanks, not 5.

My favorite bug is still some piece of C code buried in a library that was
like this:

    
    
       if (flag == TRUE) ...
    

Or there's the old for loop typo (much easier to detect on today's high-DPI
screens):

    
    
       for(float x=0.0; x < 3,4; x = x + 0.1) ...

~~~
viraptor
I don't get what's wrong with the flag check apart from being a bit too
verbose.

~~~
mpyne
TRUE is going to be 1. The flag value might have the value 0x2, 0x4,
0xsomething_else, etc., which will not equal exactly 1 (TRUE).

What you want varies but is probably something like:

    
    
        if (flag & FLAG_VALUE) ...
        if (flag) ....
        if ((flag & FLAG_VALUE) == TRUE) ...
    

that kind of thing.

~~~
geon

        if ((flag & FLAG_VALUE) == TRUE) ...
    

That would still check if flag is 1, right?

~~~
azth
That is correct.

~~~
mpyne
Indeed, great catch (this is why I use things like QFlags now, as it has a
simple "testFlag" method)

------
dpark
> _the imaginary constant i in MATLAB was getting overwritten!_

The constness of _i_ in Matlab is also imaginary, which is pretty terrible.

~~~
Dove
Yeah, I'm hoping by "overwritten" he really meant "shadowed". I don't know
MATLAB, but being able to change the value of i seems both useless and
dangerous.

~~~
dpark
I think it's shadowed globally, which has the same effect as it being
overwritten. But I'm not certain.

------
jstanley
Unlike many of the other commenters, I don't see much problem with having "i"
mean the imaginary unit. Where MATLAB falls down though is:

1.) not having lexical scoping

2.) not having warnings when a global is overridden in a local scope

these are pretty inexcusable. For anyone looking for a "better MATLAB", Python
has NumPy, SciPy and Matplotlib which are pretty much everything.

~~~
w0utert
>> _For anyone looking for a "better MATLAB", Python has NumPy, SciPy and
Matplotlib which are pretty much everything._

I hear this at work all the time, "just use Python, we're going to use Python
everywhere we use Matlab from now on". It simply doesn't work like that.

Usually it's programmers who are too overly enthusiastic about replacing
Matlab with Python, because they are comfortable with all aspects of
programming languages. The people typically writing the most complex Matlab
code aren't, more often than not they are physicists or mathematicians. Python
+ SciPy + Matplotlib is not nearly as accessible as Matlab, and you drag in
all kinds of typical 'software engineering' problems that don't exist when you
stay inside the Matlab IDE and its related tools. I can conjure up a plot in
Matlab and manipulate it incrementally for example, while Matplotlib often
requires a lot of crazy data mangling and setup code that isn't required in
Matlab (trust me, I've spent months re-implementing a library of Matlab
plotting scripts in Matplotlib, and the code of some of them is completely
untraceable to the original Matlab code, and full of stuff completely
unrelated to plotting). Also, lots of Matlab code relies on all the crazy
implicit operations and type juggling allowed by Matlab, many of which are
almost fundamentally incompatible with a well-defined programming language
suited for production code.

This is all even ignoring the fact that Matlab comes with a pretty extensive
library of toolboxes that have no equivalent in SciPy or Numpy. If you're
really determined and know how to program in Python, you can basically get a
lot of stuff done without requiring Matlab, but if you've always been using
Matlab like a prototyping tool or a very elaborate graphical calculator (and
this constitutes a very extensive part of Matlab users), Python + all its
scientific libraries is no substitute.

~~~
jstanley
Good points, and that provides some perspective I hadn't considered.

However, I still think my points about where MATLAB falls down are valid.
There is no reason for it not to have lexical scoping, or warnings about when
you are doing stupid things. It just doesn't have them, and it causes
frustrations like the one in this article.

~~~
w0utert
Oh totally agree, Matlab-the-language is terrible, and Python is a million
times nicer to work with. My comment was only about the suitability of Pyhton
+ Numpy + SciPy + Matplotlib as a Matlab replacement for non-programmers.

------
untog
I made a similar error once working in Adobe Air (if I recall):

    
    
        for (x = 0; x < thing.length; x++) { }
    

I didn't put a "var" before the x- so Air used the window.x object, and
shifted the entire contents of the window to the right by a seemingly random
number of pixels.

To look at it like that, it's blindingly obvious. But it was in the middle of
a whole lot of other code, and of course, the for loop ran perfectly fine.

------
yuvadam
Nice anecdote, but that's a rather uber-sensational title - not in a good way.

~~~
cfinke
Agreed; I expected it to be the story of Therac-25
(<http://en.wikipedia.org/wiki/Therac-25>)

 _It was involved in at least six accidents between 1985 and 1987, in which
patients were given massive overdoses of radiation, approximately 100 times
the intended dose. These accidents highlighted the dangers of software control
of safety-critical systems, and they have become a standard case study in
health informatics and software engineering._

~~~
swanson
It's linked to in the first paragraph.

------
revelation
Thats not so much a "nice bug" but rather a big fat warning sign for anyone
trying to use MATLAB.

------
stevoski
Matlab: the language designed so poorly it makes PHP look good.

~~~
dubcanada
I love that you said that, I didn't read my daily dose of crying about PHP
today. Thanks!

------
electic
This is not the worst bug ever. Nice story, however, the title is way over the
top.

~~~
bascule
Yeah, I was thinking more like Therac-25 giving people radiation overdoses or
the Patriot missile failures due to floating point roundoff error

------
CodeCube
Kind of along the same lines in the sense of being _my_ biggest facepalm
moment; many years ago in my first professional programming gig we had full SA
access to our sql server. We regularly did troubleshooting via direct SQL
statements (a practice I've since shied away from). One particular day, we
were troubleshooting a problem in our monthly billing process. I went to
delete a few rows and foolishly forgot to include a limiting statement on the
WHERE clause, which resulted in most of the data being blown away.

Thankfully, we were able to restore that table from a backup and rerun
everything to rebuild the table's state to a correct place ... but suffice to
say the experience taught me many things about attention to detail, and how
debugging in production should happen.

Good times!

~~~
Erwin
A good practice there is to do tricky DELETE/UPDATE inside a transaction. In
e.g. Postgres, executing DELETE/UDPATE tells you number of affected rows, so
if you see 1243 rows were updated rather than 1 as you expected, you can
ROLLBACK.

~~~
bartonfink
Another good approach is to replace 'delete' with 'select count(1)' or
something similar. The where clause can remain the same, you know how many
rows will be affected and there's less overhead of remembering to set up a
transaction, writing an update statement, rolling back, etc.

~~~
CodeCube
Yeah, if I ever did have to troubleshoot in SQL directly, I definitely started
doing this ... only changing it to update or delete when I was 100% sure it
was affecting only what I wanted to affect.

------
seanpont
As a mechE, I was brought up on Matlab. We were taught to use ii, jj, kk... as
our index variable names. Many years later, I still prefer them to i, j, and
k.

~~~
varjag
Like stuttering of a shell-shocked veteran, it's a battle scar of Matlab
survivor.

------
pfortuny
That is a pretty awful MATLAB perk, using 'i' as the sqrt(-1) (and indexing
the arrays from 1...).

It makes polyglot programmers make a lot of silly mistakes. It would have been
SO easy to call the imaginary constant just 'I' instead of 'i'...

~~~
TeMPOraL
Or just make it not mutable. Throw a big, fat error when someone assigns value
to i.

~~~
pfortuny
This would have been the best way to learn it, some kind of 'CRASH Eh! MAN
your files have been formatted and all this project has been deleted because

i=sqrt(-1)

Sorry for the mess, hope you learn from this...'

But I keep tripping over the same stone...

~~~
KMag
Presumably the parent meant to throw an exception and show a script-level
stack trace, not silently exit with no diagnostic messages.

Insisting on doing something, even if it's the wrong thing, is much worse in
most applications than refusing to run in order to avoid producing incorrect
and misleading results.

------
nollidge
Just want to say thanks to all the Hyperbole Police in this thread pointing
out this isn't _actually_ the worst bug ever.

~~~
swanson
Ya - I was going for a reference to Comic Book Guy:
<http://en.wikipedia.org/wiki/Comic_Book_Guy>

Thought the periods would make it clear. Oh well!

EDIT: Looks like the periods got removed from the title - I can see why that
would make the reference a bit more obtuse.

------
spalletti
it's the most frequent error you can do when writing 'a-little-bit-more'
complex scripts using MATLAB. it's sad to say but MATLAB takes too much care
of backward code compatibility and they didn't fix this! all matlab's
newcomers do that error. :)

------
geon
On my first programming job (2004), I was adding shopping cart functionality
to a proprietary CMS written in PHP. For some reason I couldn't get a simple
item listing to work. I had spent several hours debugging it before I found
the problem.

The CMS had a database query result class with a method named something like
"numberOfRecords". It returned the number of records minus one...

I told my boss about this, and his answer was "Yeah I know, but we don't want
to fix it because a lot of code depend on that bug."

------
carlob
I'm baffled by the fact that the single most common name for an integer
variable has a special meaning (though it's not a reserved word). I'm pretty
sure this is a bug in MatLab, not in OP's code.

But still I was wondering: isn't that code horribly un-idiomatic for an array
based programming language?

~~~
jarek
I'd bet i has been used to denote the imaginary unit long before it was used
as a loop counter in programming languages. Wikipedia appears to date the
former to sometime in the 18th century.

------
wereHamster
Oh the joys of side-effects. Wouldn't you love to have a programming language
that doesn't have them?

~~~
SoftwareMaven
It would never work for something as side-effect prone as pure mathematics!

------
manojlds
If that is the worst bug ever, I would die a happy programmer. Btw, why would
anyone use the 3*i form when there is the much clearer 3i form?

~~~
daeken
What if you're dealing in imaginary units that aren't constant? E.g. foo*i.

~~~
manojlds
Ah, right. I am surprised there is not a better syntax for it though.

~~~
Stwerp
It should still be ' foo * 1j '

------
why-el
Don't trust it too much: <http://cm.bell-labs.com/who/ken/trust.html>

------
varjag
Fun, but there are worse things than scope shadowing bugs. At least this one
is deterministic and can be zeroed in with a few trace prints.

------
anonymouz
Python (before version 3) leaks variables from list comprehension, while it
seems natural to assume that the comprehension has its own scope. Makes for
some funny bugs:

    
    
      i = ...
      ... # lots of code 
      [ do_stuff(i) for i in range(n) ]
      ...
      # whoops, our outside i has now been modified

------
solox3
If you call this the worst bug ever (t = [-infty, +infty]), then what's this?

[https://github.com/MrMEEE/bumblebee-Old-and-
abbandoned/commi...](https://github.com/MrMEEE/bumblebee-Old-and-
abbandoned/commit/a047be85247755cdbe0acce6f1dafc8beb84f2ac)

------
paines
That is not what I call "worst" bug. It is a pretty cool bug. Screw Matlab for
hiding this from you.

------
geuis
Maybe change the title.

------
antsam
I spent a few hours debugging an OpenGL program I was working on only to
realize that I forgot to account for normalized coordinates. All I had to do
was individually multiply x and y by the canvas size.

Always start with the small, obvious stuff first!

------
jblock
For some reason, I've found more bugs like this in MATLAB than in any other
domain. Ever.

------
agumonkey
Colleague had a long long debugging session doing MDA based code generation,
turns out JET (java/ibm/eclipse) template engine had a stupid arithmetic bug,
the kind you'll only investigate after everything in the stack has been
checked.

------
Pinckney
Thanks for posting this -- I just went and fixed just such a loop in my own
code.

~~~
swanson
Ha! Glad I could help :)

~~~
Stwerp
Me too! I changed my array counter to 'j'. :P

I work in matlab a lot and have had this error creep up several times. It's
very annoying, to say the least.

~~~
swanson
The better fix - which I mentioned as well - is to use `2+3i` instead of
`2+3*i`. The former will always use sqrt(-1).

~~~
Stwerp
I agree. I think I picked up this syntax from Python. My response was intended
as tongue-in-cheek answer since Matlab allows both i and j as the complex
number. I mainly see this when editing or modifying scripts written by others
in my lab and have to pinpoint their counter usage.

------
SaulOfTheJungle
Why did he get the correct results when the script was run stand-alone?

~~~
earless1
He was passing the numbers directly when running stand-alone. His other code
was looping through an array.

------
teilo
I don't know what I would do without list comprehensions. I can hardly imagine
using iteration loops and counters anymore, and this is one among many reasons
why.

~~~
recursive
List comprehensions don't really handle performing a list of actions for their
side effects alone.

~~~
teilo
The example in the code was not a side-effect. They were creating a new list
by invoking a function on every member of the old list.

Here's the same thing re-written in Python:

    
    
      d = [compute_diffraction_at_wavelength(x, WAVELENGTH) for x in LensLayers]
    

Or oldschool Python (and still faster, I think):

    
    
      d = map(compute_diffraction_at_wavelength(WAVELENGTH), LensLayers)

------
jtanderson
Oh, the things a little syntax highlighting can prevent...

------
seanhandley
"those where"

</grammar nazi>

~~~
swanson
Fixed, thanks.
[https://github.com/swanson/swanson.github.com/commit/81c14ed...](https://github.com/swanson/swanson.github.com/commit/81c14ed32a5d603bff95ccac48f5e405902c4514)

------
grayrest
Further evidence that j is the one true imaginary unit.

Title is still sensationalist.

------
erikb
Who upvoted this?

------
martinced
Nice article but there's a lesson here.

First I'll come with an old bug of mine (early nineties). The bug was very
hard to reproduce because it was very very occasional (but terrible in
consequences). I simply couldn't find it. I knew I was probably smashing some
memory somewhere (I was not that experienced in C) but couldn't find where.

So what did I do after weeks of hunting the bug? I moved my design away to an
entirely deterministic one and started recording all the inputs the program
received and then I could replay them. At one point, logically, the bug
happened. And so was it "recorded". Because my program was now deterministic,
I could simply feed it the inputs (and the time at which they happened) and,
surely enough, the bug was there.

And then finding the cause of the bug was of course trivial.

What's the lesson?

Well, first obviously bugs that you cannot easily reproduce because you cannot
"reproduce the state" are typically kinda hard to track down.

The less "state" in your program, the easier it is to reason about your
program and the easier it is to reproduce the state.

The lesson?

Functional programming rocks.

