
Are you one of the 10% of programmers who can write a binary search? - MikeTaylor
http://reprog.wordpress.com/2010/04/19/are-you-one-of-the-10-percent/
======
raganwald
To be precise, the post says that only 10% of programmers can write it
correctly without testing. I agree that this is an interesting statistic,
however the industry has been deprecating implementing an entire algorithm in
one go in favour of implementing algorithms incrementally. Were I to write it
without testing I would probably write it incrementally and try to perform the
testing in my head as I went along. It feels very retrograde to try to write
the whole thing out and only then start thinking about off-by-one errors,
computer arithmetic issues, and other edge cases.

~~~
cschneid
It feels like a totally made up problem:

"Here's a binary search algo description, implement it perfectly without
testing".

That's not even close to how developers work, and he amplifies the disconnect
with the real world by choosing a problem known for it's boundary and off-by-
one risks.

~~~
DLarsen
While this scenario may not allow developers to work like they normally do, I
think the author's point is that a simple binary search isn't _that_
complicated. A decent dev should still be able to a pretty good job despite
those artificial constraints.

~~~
brazzy
And the author is simply wrong about his point (if that is indeed his point).
A binary search _is_ that complicated. It just doesn't look like it until
you've considered all the possible edge cases, and chances are that most
"decent devs" will miss some and be part of the 90%.

~~~
MikeTaylor
Speaking as the author: his point, if any, is more along the lines that if we
need a test-as-you-go approach to get something this simple to work, it's no
wonder we have problems when we have to implement something that's actually
complex.

Also: no amount of test-drivenness can save you from just not grokking the
problem (as Peter Norvig's Sudoku solver showed ... er, by counter-example.)

~~~
124816
I'm missing the back story -- tell me more about Peter Norvig's solver?

~~~
nicko
[http://pindancing.blogspot.com/2009/09/sudoku-in-coders-
at-w...](http://pindancing.blogspot.com/2009/09/sudoku-in-coders-at-work.html)

<http://norvig.com/sudoku.html>

------
RiderOfGiraffes
Short answer: Yes, I am, including without testing.

For several years I ran a challenge for exactly this question. I have a test
harness and test suite, and would run people's programs through it. Roughly 1
in 30 or 40 of those submitted without testing (self-certified) would pass all
the tests. Those who tested before submission passed about 1 time in 3.

The vitriol heaped on me was quite unbelievable, and in one case led to
concerns for my physical safety. People who had decided to try the challenge
then rounded on me and said it was unfair, unreasonable and unrealistic. I
didn't affect the results - the vast, _vast_ majority of people simply could
not write a binary search that passed my test suite.

FWIW, my first attempt had exactly one problem, to the best of my knowledge. I
decided to derive my code formally from the specification. The mistake
concerned computer arithmetic, not the algorithm, and was the one lunk to from
the article.

But my background is unusual.

~~~
raganwald
> Those who tested before submission passed about 1 time in 3

"Programmers are not to be measured by their ingenuity and their logic but by
the completeness of their case analysis." --Alan Perlis

~~~
RiderOfGiraffes
I would add - and their completeness in testing the correct implementation of
those cases, and the correctness of the implementation.

------
Dove
No incremental testing strikes me as a rather valueless constraint. Small
experiments are one of the most crucial skills -- if not _the_ crucial skill
-- in writing correct code. It's like asking me to swim with one arm tied
behind my back to see if I am a good swimmer.

Sure, I can run these 'tests' on scratch paper or even in my head, if you
require me to. But real life never requires this. I prefer to prove to myself
that each piece of an algorithm is correct--using a combination of reasoning
and microtesting--before moving on to the next. Why would I waste attention
doing this all in my head? The computer will do many pieces of it for me
quickly, accurately, in a way that catches language idiosyncracies, and most
importantly frees my attention for considering more abstract errors.

Edit: I had to try anyway, and (as far as I can tell) I _was_ able to do it
correctly in about three minutes. But I suspect it has less to do with native
programming ability and more to do with having written dozens of variants on
binary searches in the past year (I've had reason to want to handle various
types of fuzzy matches, filtering and desired multiplicities in return
values). FWIW, even after a fairly exhaustive suite of random input tests, I
trust this binary search I have just written far less than the many I've
recently written using my usual incremental methodology. Were I to include
this in an actual release, I would not attempt to prove what I just wrote
correct. I would destroy it and do the job properly from scratch.

Which rather underscores my point. I just wrote some code the wrong way, and
even in spite of quite a lot of recent domain experience, a pretty good test
suite, and a lot of staring at it and thinking about it . . . _I don't trust
it because I didn't test the pieces while I was writing it_.

~~~
RiderOfGiraffes
Do you want to send me your code for testing in my test suite? You've
obviously tested it now, but it would still be interesting.

~~~
Dove
All right. It's in perl. Here's what I have:

    
    
      sub binary_search {
        my($ary, $val) = @_;
      
        print "Looking for $val...\n";
      
        $val < $ary->[0] and return undef;
        $val > $ary->[-1] and return undef;
      
        my $min = 0;
        my $max = @$ary - 1;
        my $mid = int(($max + $min)/2);
      
        while($max - $min > 1) {
          $ary->[$mid] == $val and return $mid;
          $ary->[$mid] >  $val and $max = $mid;
          $ary->[$mid] <  $val and $min = $mid;
      
          $mid = int(($max + $min)/2);
        }
      
        $ary->[$min] == $val and return $min;
        $ary->[$max] == $val and return $max;
      
        return undef;
      }
      
      my $ary_size = int(rand(10));
      my @ary;
      for(my $i = 0; $i < $ary_size; $i++) {
        push(@ary, int(rand(10));
      }
      
      @ary = sort @ary;
      
      print join(",", @ary), "\n";
      my $rv = binary_search(\@ary, int(rand(10)));
      print "Returned $rv\n";
    
    

I see already that it has the famous 'large array' bug. Not something I
encounter in my work, so I'm not on the lookout for it. I'm curious if you'll
find anything else wront with it.

~~~
RiderOfGiraffes
Can/do you write in C? Would someone care to translate this?

I'll do it tomorrow at work, but if someone can post and/or email me an
equivalent C version it will reduce my time. Not a problem, but just more
convenient.

You'll need to give me some way to contact you so I can send you the results.

EDIT: I've just realised the spec I usually give asks for the first occurence
of a key to be returned in the case where there are duplicate keys. I'll
modify my tests to account for that. It's an especially useful feature in a
binary search, but rarely implemented.

~~~
Dove
Having to do it in C is even _more perverse_ than doing it in my head.

;)

~~~
Dove
In all seriousness, if you have to translate it, don't bother. I write in C
nearly daily, and I certainly wouldn't trust myself not to lose something in
translation.

I'd be interested in what's in your test suite, though. I must sheepishly
admit that my 'fairly exhaustive' test consisted of running the above random
input until I'd seen all the usual tricky cases go by:

    
    
      - an array of length zero
      - an array of length one
      - an array of length two
      - the value is present
      - the value is absent and too small
      - the value is absent and too large
      - the value is absent and in the middle
      - the value is the first entry
      - the value is the last entry
      - the array is all one value 
      - there are multiple copies of the value you're looking for
    

If you've got anything more interesting in that test suite, that'd be good to
know. Like I said, I represent the one-bazillionth of programmers that
actually have a recurring professional interest in correctly coding binary
searches.

~~~
joezydeco
An array of length Maxint?

~~~
ableal
Good call. As per link in <http://news.ycombinator.com/item?id=1277701>
(elsewhere in this page):

 _the binary search program that Bentley proved correct and subsequently
tested in Chapter 5 of Programming Pearls contains a bug [...] it fails if the
sum of low and high is greater than the maximum positive int_

------
njharman
Nope and can't say it ever came up in the 20 years I've been programming for
myself and others.

~~~
Dav3xor
I've written one. It was for an embedded system.

A good, well tested binary search is a pain in the ass to write...

1\. Your sorted array MUST use exactly the same ordering as the search is
expecting. We built and sorted the array on a PC, and ran the search on the
device, so this was not guaranteed.

2\. There can be NO DUPLICATE KEYS.. Duplicate keys will eventually send the
search down a blind alley and off into hyperspace. Also devilishly difficult
to find if the array has a lot of items, and you don't know what the problem
is.

3\. The above 2 problems can sneak in during production with new data.

~~~
RiderOfGiraffes
In a well-implemented binary search duplicated keys are not a problem. You can
even tune your code to return the first, the last, or a "random" one.

The point about the sort order matching the search order is more serious.

I've written all sorts of fundamental algorithms for embedded systems and I
only really started to get good when I wrote the spec, wrote the proof, then
derived the code. This was done informally, but a background in Pure Math let
me do it without too much effort.

And it was worth it. Code developed that way tended to run first time with no
surprises. After a while we increased the number of things we needed to prove
about the code, and reliability was solid.

It seems the world of programming divides into several sub-classes, including
(but not limited to):

\+ Those who program web sites and databases

\+ Those who program numerical systems and simulations

\+ Those who program embedded devices and fundamental algorithms.

These are broad, and there are no doubt many more, but the skills required in
"programming" seem to be dividing and become more differentiated.

~~~
Dav3xor
Yeah, the duplicate key business came in because of other code hanging off the
side of the search (a rather hairy algorithm that was searching for the next
key with a different letter in a position -- given KLXX, increment the 2nd
position until there's a key matching that criteria -- KLXX --> KMAA, or if
there are no KM __'s, KN __, KO __, etc....

------
barrkel
I've written binary search for production code (Delphi RTL generic
collections), and gotten it right, but more embarrassingly, I also wrote
quick-sort and got it wrong. It was a mishmash of two strategies IIRC - a
straight recursive quick-sort, and a half-inlined version with a loop, and it
didn't work at all. Escaped into the wild owing to time crunch and me electing
to do squeeze too many things in at the last moment.

~~~
eru
If your compiler/interpreter supports tail-call-optimization you should get
the half-inlined version for free.

------
robryan
Nothing like a only x% of you can write this to get programmers frantically
posting code fragments.

------
abstractbill
_NO TESTING until after you’ve decided your program is correct._

Why? What is there to be gained by asking programmers to abide by such an
unnatural constraint? I would never "decide my program is correct" without
having tested at least bits and pieces of it in the REPL.

~~~
LiveTheDream
Because every programmer's gut reaction to binary search is "that's easy!".
The no testing rule says "orly? then show me"

~~~
abstractbill
_every programmer's gut reaction to binary search is "that's easy!"_

Perhaps I'm not a programmer then. Or I'm a strange one I guess.

I tend to approach _every_ programming problem assuming the worst: This will
be much harder than it seems. There will be corner-cases. I will miss some of
them and do the wrong thing for other ones. I should try to enumerate every
weird condition I can think of, and then throw random inputs at things to
catch the stuff I missed.

~~~
sukuriant
Sounds like I'm not a programmer, either. Working on the Programming Team for
my school for many years taught me to consider edge-cases.

And personally, there's quite frequently SOMETHING I forget when writing a
binary search, and it usually ends in it infinite looping -- so, I often have
to go find my stupid mistake.

------
more_original
Nearly All Binary Searches and Mergesorts are Broken:
[http://googleresearch.blogspot.com/2006/06/extra-extra-
read-...](http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-
about-it-nearly.html)

------
DavidMcLaughlin
> One of the reasons I’d like to see a copy of the Second Edition is to see
> whether this passage has changed

For the author: it hasn't changed. Still only 10%.

I actually had to write this binary search algorithm in my technical
competency interview for my current job. Rather than hours, I had minutes and
the extra gotcha was that it was unsorted data so I had to write quicksort
first. I remember feeling excited that I was going to be working for a company
that had such low level issues but eight months in it's still the only time
I've written my own sort or search methods since University :)

------
stcredzero
A friend of mine once did work in C++ for the DoD. There was a logistics
application that used Binary Search, which had a bug he was working on -- lots
of items were somehow "misplaced" in the system. Eventually, he discovered
that there was no guarantee that the list being searched was _sorted_!

As for writing out whole C programs and getting it right without debugging --
I once wrote a bit-banging controller for a model aircraft. We did bit-banging
with a Gumstix board, which would let us use GPIO to implement 9 PWM inputs
and 9 or 10 PWM outputs in software. I wrote a bit-banging loop that did all
of the inputs and outputs and let programmers define control mixing functions
and finite state machines.

I was unable to test it, because you can only test something like this by
attaching to the hardware. Well, _it worked the first time and it can run the
remote control of a model aircraft in flight_. (There was one tweak later, to
eliminate some jitter in one of the PWM outputs, but that was just a
refinement of function. It _just worked_.)

------
ohashi
I wrote my version in PHP, wondering if someone could tell me what I did wrong
(I didn't test it, I assume there are mistakes).

    
    
      <?php
      function bsearch($search, $array, $start, $end){
          //$start cannot be zero.
          //$search not in $array
          if($start>$end){
              return -1;
          }
          //calculate middle
          $middle = $start + ceil(($end-$start)/2);
          //check match
          if($array[$middle]==$search){
              return $middle;
          }
          //is the middle higher than the search, if so, cut top
          else if($array[$middle]>$search){
              return bsearch($search, $array, $start, $middle-1);
          }
          //otherwise cut bottom
          else{
              return bsearch($search, $array, $middle+1, $end);
          }
      }
    
      ?>

~~~
gizmo
Yup, no sigar.

1\. You get a division by zero when $start == $end

2\. When the element $search doesn't exist your program breaks

3\. You always check array for value $middle, so $middle should have the range
0 <= $middle < length($array). But according to the comment $start cannot be
zero. If so, then $middle is always larger than zero, and as such, the first
element of the array is never considered.

So it's pretty buggy.

~~~
ohashi
1\. i knew there was issue with start==end. what is the solution for that
issue? i had originally written $start<=$end

2\. i think this is connected to previous issue, with ending it

3\. I knew the zero thing would be an issue, but... had to write it before
caps game started :(

------
cousin_it
Yay, I did it in 20 minutes and even remembered to test for the empty array
:-)

It _was_ kind of stressful to have to pass 13 testcases on the very first
attempt. I rigged them to just shout "YOU FAIL" at me without telling the
specific place. Reread the code for one last time and ran it. After it
completed without printing anything, I stared at the screen for like ten
seconds, then went and changed one testcase to deliberately fail in case I'd
screwed up the printing. Nope - everything went better than expected :-)

But Raganwald is right, this exercise doesn't really tell anything about my
programming skill. I mean, I don't work like that at all. My first attempts
are _always_ broken. Iterating is faster (or maybe only feels faster) than
trying to get the first attempt right.

------
Jach
I remember reading the linked post on the Google blog. Ah, here it is:
<http://news.ycombinator.com/item?id=1130463> (Wow, that was only 60 days
ago?)

Maybe that's why when I did this in C just now, my version ended up almost
identical to the Java version. (Mine does the more correct "int middle = low +
(high - low) / 2;", and I wasn't even thinking about the possibility of
integer overflow the other way.) So yes, but only from benefit of having to
have written it numerous times in school and in teaching newbies to use it.

------
chipsy
I started writing a solution and then stopped partway through because the
pressure of getting it right in one shot was incredibly stressful.

If this were given to me for an interview question, I'd be incredibly
meticulous and build a series of algorithms for len=0, len=1, len=2, etc.,
including various permutations of values, until I establish a general
solution, which I would then submit for the final answer.

I would hardly ever do that in actual coding, though, because I could rely on
hammering it with tests instead.

------
ableal
Side-issue: why isn't Golden section better known ?

Luenberger's book [1] convinced me that a line search should be done with
golden section, when I was hacking non-linear optimization. Yet I rarely see
it mentioned - seems always to come down to binary search ...

[1] [http://www.amazon.com/Linear-Nonlinear-Programming-David-
Lue...](http://www.amazon.com/Linear-Nonlinear-Programming-David-
Luenberger/dp/0201157942)

~~~
RiderOfGiraffes
Binary means division into two. Dividing in unequal parts is still a binary
search.

And Golden Section is only indicated under some circumstances. It's big-O the
same, but not little-O the same, and it depends in the exact details. It's
also hard to get those details right.

~~~
ableal
Thanks, good point on the binary ...

------
joegaudet
This is interesting, a somewhat related post can be found here:

[http://googleresearch.blogspot.com/2006/06/extra-extra-
read-...](http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-
about-it-nearly.html)

The gist of this is that version of BSearch implemented in the JDK contained a
bug due to an overflow error. I recall reading about it some time ago, funny
to see it come up again.

------
derwiki
I'm amazed how many interviewees can't even properly explain binary search,
down to saying "linked list" as the base data structure.

~~~
raganwald
> "linked list" as the base data structure.

I would enjoy reading a logically correct implementation of a binary search on
a linked list. Discussing the performance complexity would be an obvious next
question in the interview.

[http://en.wikipedia.org/wiki/Schlemiel_the_Painters_algorith...](http://en.wikipedia.org/wiki/Schlemiel_the_Painters_algorithm)

~~~
brazzy
Well, a skip list _is_ just a bunch of linked lists...

~~~
sesqu
I would describe it more as a multiply-linked list. The important distinction
in a skip list is that you get random access, in a different sense of the
term, and so binary search makes sense. Similarly, you could justify a binary
(or another non-sequential) search on a traditional linked list by assuming
huge evaluation costs.

------
va_coder
Here's a link to some of his work. A big fat hairy perl file:

[http://cpansearch.perl.org/src/MIRK/Net-Z3950-Simple2ZOOM-1....](http://cpansearch.perl.org/src/MIRK/Net-Z3950-Simple2ZOOM-1.04/lib/Net/Z3950/Simple2ZOOM.pm)

Knowing how to write code that is easy to read is just as important, if not
more so, than knowing how to write binary searches.

~~~
johnfn
How is that relevant? It's essentially equivalent to an ad hominem. Just
because the guy wrote bad code doesn't mean he might not have a good point.

~~~
invisible
It is not an ad hominem because he was not arguing the point was invalid,
rather that the guy is foolish to such a statement while not advocating code
structure and style. Notice "...is just as important..."

I agree - that file he linked is very difficult to read and would make it much
harder for me to find problems.

------
noarchy
Maybe I was fortunate; I learned how to write a binary search in my lower-
level CS courses. I didn't even find it particularly difficult, but then
again, the concept was explained quite clearly before I had to code it.

~~~
barrkel
The trick with this binary search test isn't knowing or being taught or
learning the algorithm, or finding it difficult; it's making sure you find and
deal with every corner case, first time, without testing. For example: is an
empty array handled properly? An array with duplicates? Is every item,
including first and last, found correctly? What about arrays with an even
number of elements vs odd, vs power of two? How about an array with over 1
billion elements (the integer overflow problem)? Are values before and after
the sorted range indicated properly? If working with duplicates, do you want a
found index at the start of the range, or at the end?

You get the picture.

~~~
waterlesscloud
So what this is really all about is this- do you mentally step through your
code and consider what could go wrong at each step?

~~~
barrkel
I would rather think in invariants - knowing what must be true at particular
points in the code, at the start of loop bodies, etc. Essentially, doing a
proof in one's head.

------
FutureNerd
Some people here think this is about coding without testing. No, it's an
example of thinking through _before_ testing, about the kind of thinking
that's separate from testing. It's not meant as an example of what all
programming is about, but a probe of an aspect of programming. So why.

Is it okay to write down a guess, then look at it with edge cases in mind, fix
and iterate? I think sketching is okay for this exercise as long as you follow
through and finish.

The difference is that sketching doesn't tell you what the bugs are. Whereas
testing can tell you specifically. One's an aid to thought and the other's...
an alternative, to be neutral about it.

A couple people have described case analysis like simulating tests in your
head. I hope they actually use more principled thinking than they're letting
on.

Iterative coding breaks into two cases: either the task is
subjective/aesthetic, like an interaction or layout problem, or it really is
an exact task but you don't have it clear in your mind. Everybody recognizes
the value of iterating on a fuzzy problem, put that aside. The question is, if
it's an exact problem but you don't understand it well, then will test-writing
clarify or obscure things?

Sometimes I iteratively write and run or test a program until I get to a point
where I say, "Okay, _now_ do I understand what this is about? Can I explain it
thoroughly in full sentences?" Even so I'm sure I've fooled myself with tests.

In real world coding, you aren't given small problems. That doesn't mean small
problems are poor examples. Obviously we should break big problems into as
small and comprehensible problems as possible. Testing doesn't do that
(although it could make incentives).

Not to mention the fact that there's no such thing as a complete test for a
program with an infinite set of possible inputs. You can test for what seem to
be the kinds of bugs the program could have.

The unstated assumption of many here is that somehow in the process of testing
we'll come upon all the right tests for that program, without any sneaky
missed cases. Then you will look at the tests and code and somehow extrapolate
or judge that you've covered all the bases. But how sure is this if you still
aren't sure exactly what the problem is, or if you don't think about bugs and
cases very well?

There's a kind of thinking you have to do within and around testing anyway.
The write- before- testing challenge tests how well you do that necessary
thinking, and how reliably you judge how well you've done. It's an artificial
light on a real issue.

Btw, my test file, meant to be easily processed in any language:
<http://www.mac-guyver.com/switham/2010/04/Binary_Search>

