Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A common bug in published code (google.com)
140 points by shawndumas on April 21, 2011 | hide | past | favorite | 71 comments


Someone should write a tool that scrapes these results and automatically files bugs at the associated bug trackers for open-source projects.


I've been thinking of doing something and i'm looking for people to work with! Notes here: http://memeschemes.com/code_search/


A bug in that program would certainly be wonderful. Have fun when it files a few million unnecessary bug reports while you're sleeping.


Have it require approval for each bug report. While it isn't as fast as a fully automatic bug finding system, it is faster than a manual find and file. Plus it takes care of those situations where it was intended(poor coding, but it works).


I've had some similar ideas myself, consider me interested. I've written some (very little) code to search projects among the big open source hosts (SF, Google, Github etc). Also have a nice domain for it.


This will be very useful as long as it finds confirmed bugs. Otherwise it will be more like an unasked-for code style check (For example, one can argue using functions like strcpy are unsafe, but unless it's really possible to get too many characters in the buffer, it's not a bug)


I've been wanting to do that for:

* NPM packages whose package.json file is missing a link to the package's GitHub repo

* Markdown READMEs without the right file extension (for GitHub)


For those who haven't picked up the error: Math.random() returns a value between 0 and 1, and the typecasting floors the value.


See the previous HN discussion: http://news.ycombinator.com/item?id=960886


Oh, so it always rounds down to zero, that's pretty bad. Casting it to a double would help right?


Instead of

int runs = (int) Math.random() * 1000000;

You'd do

int runs = (int) (Math.random() * 1000000);


That would be pointless, because Math.round already returns a double.


One would hope at some point a tool would warn you about this since clearly it could optimize (int) Math.random() to just 0.

[queue debate about tools that hold your hands vs understanding what you are actually writing]

I wonder if they did the search for if (x = y) bug pre-gcc-4.x-warn what sort of numbers they would get.


"queue debate about tools that hold your hands vs understanding what you are actually writing"

For a start, explain why, in Java, (int) Math.random() is a bug. A programmer that happens upon this thread, now knows "this is bad", but has no clue why. A tool will only make the programmer even more reliant on tools and IDEs, so this kind of bug will be eliminated, but the frame of mind that spawned it will live on.


Findbugs, the popular Java tool I linked to in another comment, explains the bug like this:

"A random value from 0 to 1 is being coerced to the integer value 0. You probably want to multiple the random value by something else before coercing it to an integer, or use the Random.nextInt(n) method."

That's about as good an explanation as you'd get from anyone.

The way I see it is this. Simple bugs like this happen because _human beings are flawed._ All it takes is a momentary lapse of concentration and you've put the cast in the wrong place in a method you don't write a test for because you're in a hurry, and all it's doing is generating a random number with a standard API so why bother? (Or it's something you wouldn't normally even test for, like assuming something is re-entrant when it isn't)

We're inevitably going to make a certain number of mistakes a day, and it's our duty to put systems into place that catch those mistakes before they cause any more damage than they should.


I believe this is the tool you are looking for.

http://findbugs.sourceforge.net/bugDescriptions.html#RV_01_T...


I'm amazed to still find Java programmers who don't run Findbugs on all their code. It's free, fast, and has a low rate of false positives.


> queue

Speaking of common bugs...

(The word you're looking for is 'cue'.)


Unless we're talking about adding the debate to the queue of debates we're having, that is. Even then, I suppose, it should be enqueue...


Were you actually confused, or are you just picking nits? If the latter, then it's not really a bug is it.


Actually, nits are bugs. They just haven't hatched yet.

[1] http://en.wikipedia.org/wiki/Louse


Last year I found several instances of the printf format "200%d" when dealing with years. Some still exist: http://hewgill.com/journal/entries/548-y2010-bugs-found-via-...


I think that's called job security in some circles. ;)


Apparently, python only has 5 instances of the corresponding error: http://www.google.com/codesearch?hl=en&lr=&q=\s%2Bin...

Python-Java flame-war, anyone?


Slight difference here. There are actually two types of errors in the Java code:

1. int foo = (int) Math.random() * some_max_value;

The error here is assuming that the multiplication takes place before the truncation. This isn't happening in the Python code because int(expression to truncate) is unambiguous. (+1 to Python here for making it hard to shoot yourself in the foot).

2. int foo = (int) Math.random();

The error here is assuming that Math.random returns something outside [0.0, 1.0). This is the error that all five of the Python examples are showing. (Boo to Python AND Java programmers.)


Third error: This is not the correct way to randomly pick a number in a set range. The proper way is actually quite complicated. Imagine you do (int) (Math.random() * 10), this could give you numbers from 0 to 10. However, you only get 0 if Math.random() * 10 is less than 0.5, but you get 1 if the value is between 0.5 and 1.5. You are half as likely to see a zero!

I can't speak for Python, but in Java it's quite simple to do it right; Random#nextInt(int) "Returns a pseudorandom, uniformly distributed int value between 0 (inclusive) and the specified value (exclusive)" (also consider using SecureRandom).


We're all aware that casting to an integer does truncation, not rounding. Right?


At least in Java and other languages, when casting as int truncates, and doesn't round the number, so: ((int) (0.9999999)) == 0 But then there's Math.floor and Math.ceil for what you're describing which can be used as well.


  public class example
  {
      public static void main(String[] JAVA_LOL) throws Throwable
      {
          // prints "6"
          System.out.println((int)6.99999);
      }
  }


You (and everyone else) are of course right. I remembered the wrong thing, what I said doesn't apply to that method in Java. What does apply is that there are lots of subtleties in generating random numbers, and there's rarely a reason to re-invent the wheel. There's a good example of how subtle pseudorandomness is in the Java documentation, so I will link that instead of embarrassing myself further: http://download.oracle.com/javase/1.4.2/docs/api/java/util/R...


I actually struggle to understand why people even use Math.random() when the Random class exists.


Don't need an import for Math.random(). Random is in java.util, while Math is in java.lang.


Alot of people probably look for Random but find Math.random().


> I can't speak for Python, but in Java it's quite simple to do it right

http://docs.python.org/library/random.html#random.randint


Actually the python equivalent of "Returns a pseudorandom, uniformly distributed int value between 0 (inclusive) and the specified value (exclusive)" would be random.randrange(0, n, 1)

random.randint(a, b) returns a uniformly distributed int value between a (inclusive) and b (inclusive).

</nitpick>


import random # will print an integer between 0 and 10 inclusive print randint(0,10)


In that case you can always use floor or ceil.


Clearly, PHP is a better language than both: int rand($min, $max)

ducks

All joking aside, I'm just pointing out the absurdity of comparing one language to another based on how easy it is to shoot oneself in the foot.


I can help that flame war...

I bet those 5 were written by folks with more (or stronger) skill with C/C++/Java other strongly typed language where casting is required/common.

Casting is uncommon and "weird" in Python. Usually means you're being unpythonic. As in this case you should be using randint or randrange rather than cast to int.


There is no need to cast to int in Python:

    a = random.randrange(10000)


im floored


Yeah, I nearly hit the ceiling myself.


Random ints between 0 and 1 offer a pretty limited range of jokes, I guess.

http://en.m.wikipedia.org/wiki/Range_(computer_science) Edit: added range def because of downvotes. Sigh


Your pun is roundly appreciated.


i see what you did there...


the main cause here is that randomness is hard to test so that code was just assumed to be correct. With randomness you can never be sure, http://www.random.org/analysis/


Yes, but... (int)Math.random() always returns zero. A little suspicious.


You can never be sure the next one won't be a 1!


Majority of these results are examples and test cases which is probably why the bugs have not manifested themselves.


Any other examples of bugs that you can search for?


A-mazing

Also, notice the difference in number of results between C and C++ below:

http://www.google.com/codesearch?hl=en&lr=&q=if%5Cs*...

vs

http://www.google.com/codesearch?hl=en&lr=&q=if%5Cs*...


That's why I write comparisons like

if (CONSTANT == variable)



I know it's safer, but personally I find it harder to read. Especially if the constant is a #define or enum and is similar in format to the variable in question.


I personally like Yoda Conditions, but a few coworkers have pointed out that any linter worth its salt will catch an = inside a conditional, so this doesn't actually gain much in safety. I've stopped using them because there a large number of coworkers find them confusing.



Maybe I am feeling dense, but I do not understand what you are trying to point out. None except one are errors as they relate to embedded sql statements (where Equal To is the comparison operator).


maybe we re seeing different results. mine have 8 incorrect assignments in the first page


The code is generating code for a different language -- the = is always in a string.


The first few examples are fine. It's the unscaled ones later on, like "longarr[i] = (int) Math.random();" and especially "[Math.abs((int) Math.random()) % 3];" where the authors didn't notice that random() "returns a double value with a positive sign, greater than or equal to 0.0 and less than 1.0."

Here's another extreme example: (int) Math.random() / Integer.MAX_VALUE % (maxScoreCount + 1);

I eyeballed that about 5% of the calls are in error.


That's not true. Try running the following code:

    public class Test
    {
       public static void main(String[] args)
       {
               System.out.println("Test: " + (int)Math.random() * 100);
               System.out.println("Test: " + (int)(Math.random() * 100));
       }
    }
My results, from repeated tests:

    java Test
    Test: 0
    Test: 59

    java Test
    Test: 0
    Test: 18

    java Test
    Test: 0
    Test: 72

    java Test
    Test: 0
    Test: 11


It is random! http://xkcd.com/221/


Have a look at the precedence table for Java: http://introcs.cs.princeton.edu/11precedence/

The cast has higher precedence than multiplication, so the scaling doesn't help at all, unless you use parenthesis to force the right order of evaluation.


Oh, wow! I didn't know that about Java. I see then why I got all the downvotes.


They are all in error. "(int)Math.random()" always returns 0 because Math.random() returns [0.0, 1.0) (i.e., does not include 1). Casting that return value to a int will always return 0.


They are not in error if you are trying to get a 0. This is yet another good technique for obfuscated code.


Indeed. Take a look at http://www.google.com/codesearch/p?hl=en#TXS-ndgbCls/trunk/j... for instance: if you weren't looking for an incorrect cast, it would be very hard to notice. Two out of three casts are correct: only one is missing the parentheses.


A lot of the results I'm seeing in the search have more problems with order of operations -- so long as you multiply the random() result by a good-sized constant before you cast to int, it actually does what it's "supposed" to do.


This is the point of the post. There are no examples that aren't casting Math.random() to an int before doing anything with it thus using a pretty expensive way to represent 0.


The comment I was replying to was simply saying that casting the return value of Math.random() will be 0. Which is true, but is not the issue -- order of operations (for people who think the multiplication happens before the cast) is the issue.


> so long as you multiply the random() result by a good-sized constant before you cast to int, it actually does what it's "supposed" to do.

Yeah but since cast binds tighter than multiplication, any code following the pattern `(int) Math.random() * a` is broken and generates 0 every single time.

For the code to work, you need `(int) (Math.random() * a)`: http://www.google.com/codesearch?q=\(int\)\s*\(\s*Math\.rand...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: