Hacker News new | comments | show | ask | jobs | submit login
Are you one of the 10% of programmers who can write a binary search? (reprog.wordpress.com)
86 points by MikeTaylor 2712 days ago | hide | past | web | 99 comments | favorite



To be precise, the post says that only 10% of programmers can write it correctly without testing. I agree that this is an interesting statistic, however the industry has been deprecating implementing an entire algorithm in one go in favour of implementing algorithms incrementally. Were I to write it without testing I would probably write it incrementally and try to perform the testing in my head as I went along. It feels very retrograde to try to write the whole thing out and only then start thinking about off-by-one errors, computer arithmetic issues, and other edge cases.


It feels like a totally made up problem:

"Here's a binary search algo description, implement it perfectly without testing".

That's not even close to how developers work, and he amplifies the disconnect with the real world by choosing a problem known for it's boundary and off-by-one risks.


While this scenario may not allow developers to work like they normally do, I think the author's point is that a simple binary search isn't that complicated. A decent dev should still be able to a pretty good job despite those artificial constraints.


And the author is simply wrong about his point (if that is indeed his point). A binary search is that complicated. It just doesn't look like it until you've considered all the possible edge cases, and chances are that most "decent devs" will miss some and be part of the 90%.


It isn't complicated compared to the convoluted stuff that programmers code to implement business "logic." The dirty secret of software development, however, is that 90% of industrial software developers can't handle abstraction at all and rely heavily on domain knowledge when coding anything. Programmers handle the cases that are normal for the problem domain, and QA tests the cases that are unusual in the domain context (and makes the programmer fix them.)

The code will behave incorrectly on some edge cases, but if the developers and QA have covered the problem domain pretty well, it won't matter. First, most of the edge cases will never happen. Second, people get angry when software fails for no obvious reason, but they tend to be a lot more forgiving when software fails on inputs that are "freakish" or "wrong" in the domain context. Searching the widget catalog results in "500 Internal server error" when the widget catalog is empty? Well, duh! When should the widget catalog ever be empty?! The search page shows screwy results because someone entered an item with a negative price? Well, duh! Get that stupid item out of the catalog! Instead of thinking like a mathematician or a system designer in order to avoid mistakes, the typical development approach is to think like the customer so you only make mistakes that the customer can sympathize with.


I think there's a lot of truth here.

I would also point out that for a lot of programming tasks, particularly in a corporate IT environment, "thinking like the customer" can be a good thing: it helps you make sure that you are building something that truly helps the customer do their job.

The risk is that no matter how much "domain experience" you have, the customer has more, and you can fool yourself into thinking that you understand what they need better than they do.


Speaking as the author: his point, if any, is more along the lines that if we need a test-as-you-go approach to get something this simple to work, it's no wonder we have problems when we have to implement something that's actually complex.

Also: no amount of test-drivenness can save you from just not grokking the problem (as Peter Norvig's Sudoku solver showed ... er, by counter-example.)


I'm missing the back story -- tell me more about Peter Norvig's solver?



A binary search is that complicated. It just doesn't look like it until you've considered all the possible edge cases

The fact that it's tractable to rigorously consider all the edge cases means it is ridiculously simple compared to common real-world programming tasks.


If you read Bentley's original work that the article is based on then the point is exactly that even the most seemingly simple tasks like implementing a binary search are incredibly difficult if not near impossible to get "right."


If binary search is complicated, what is _simple_ by your standards?


FizzBuzz?


Which only 50% of us can solve correctly the first time?


> "Here's a description, implement it perfectly without testing". > That's not even close to how developers work

You'd be surprised. Also, terrified.


My first job after college (1992) was with Andersen Consulting. All new staff consultants went to a "boot camp" to learn how to program (they hired many people who did not have a programming background, as well as some who did). They emphasized "desk checking" as part of the development process (for this reason: submitting a compile job on the mainframe might have a turnaround time of hours).

So, one of the assignments was to write a program, and desk check it until you were confident that it would compile and run correctly on the first try. It was a one-chance, pass/fail assignment: it either worked, or there was a compile or runtime error.


Another reason is that by desk checking, you spend substantially more time working on the program - which means more billable time as a consultant.


Maybe it's not "real world" to have to implement a simple but serious algorithm in a few lines.

Maybe in the "real world", the programming is just the simple gluing together of hard stuff like databases, compilers and operating system, since they have already been written. Once you have some bullet-proof method, it really is better to just be a consumer of it.

There is still some hard stuff that needs to be done, though. I suspect that to do "the hard stuff" you need to be able to implement simple-but-serious algorithms in a few lines. This makes your overall code smaller, more readable and allows optimization for the particular cases you're working with.


Maybe it's not "real world" to have to implement a simple but serious algorithm in a few lines.

I don't think the parent was objecting to implementing a simple but serious algorithm in a few lines, but to doing so without performing any testing at all.

There is still some hard stuff that needs to be done, though.

Yes, and I'm pretty sure the people doing that hard stuff do some testing ;)


Going back to Bentley, the point was not to code without testing but whether you could think through a small piece of code <i>before</i> testing.

raganwald says, "It feels very retrograde to try to write the whole thing out and only then start thinking about [...] edge cases." Nobody said you had to write it before thinking about edge cases! The idea is to think about the edge cases and incorporate that thinking into the code before you actually test the code--whether you can be--actually whether you know the steps to be--that thorough in your head.

One way to think about edge cases, of course, is to write the test code before the function. But after writing the tests, try writing the function correctly before testing it.

Of course, one of the potential bugs in binary search doesn't occur to people before they write and then try it. And, that bug is hard to test for, ahem, don't want to explain why.


Of course you shouldn't just write the whole thing out and then start to think about the errors. And these days most code is developed incrementally.

However, not all code is developed incrementally, testing miniscule details on the way. Some code is written perhaps 20 lines at a time before testing. Binary search is only about 20 lines, my implementation is significantly less.

So there are cases where 20 lines are written, then tested. The thought experiment here is to assume that this is one of those cases, one instance of which there are probably 5 or 10 in a moderate sized project. Places where people perhaps got a little lazy, it was obviously easy, or didn't bother following the full TDD.

And as I say in a comment elsewhere, only 1 in 3 of the people who tested passed my test suite.

Your point is valid, but misses some of the point of the article, and the exercise.


I think that writing a binary search algorithm without tests is evidence of a poor programmer (or, at the very least, poor programming practice; there are talented programmers out there that can just sling code). This is a difficult and, as the sibling comment noted, error prone algorithm.

These types of problems are poster children for TDD and testing. There is under 20 lines of code but several ways to make logical errors. If we test drive the code, we can ensure that the algorithm performs exactly how we want it to.


I hate to break it to you but you'll fail interviews at the better companies with that attitude(1). No matter much support your work methods give you, you have to be able to hold and work with a few moving parts in your head at once, correctly.

(1) Don't ask me how I know :)


In my experience, this isn't true.

In fact, my 'attitude' of approaching work like a craftsman has served me very well in job interviews (even landing me opportunities at pie-in-the-sky type of jobs).


Well I wouldn't say the point of the exercise and my point are entirely disconnected, perhaps the point of the exercise is to remind people that the size of their increments is really driven by the complexity of the test suite and not by the expected length of the code.

It's an idea I explored very poorly in an old blog post:

http://github.com/raganwald/homoiconic/blob/master/2009-06-0...


"however the industry has been deprecating implementing an entire algorithm in one go in favour of implementing algorithms incrementally"

I dunno - for something as simple as a binary search there is nothing to increment. Maybe I misunderstand and you meant "incremental" as in where you write a whole algorithm, make sure it works, and then improve it (optimise/remove bugs/add features - not finish it)?

"It feels very retrograde to try to write the whole thing out and only then start thinking about off-by-one errors, computer arithmetic issues, and other edge cases."

Yeah... that is backwards. You should think of these things in advance, and if you have experience, you might have had to deal with them already. Binary search is especially simple - if you get the algorithm wrong I'm inclined to believe you are inexperienced, but if you just make typos and such, well... I'd expect that from anyone.

Then again all the rapid application development options today mean that anyone can use a binary search without writing one... or even write complete software with no real understanding of how it works.

Maybe I'm just out of date?


> for something as simple as a binary search there is nothing to increment. Maybe I misunderstand and you meant "incremental" as in where you write a whole algorithm, make sure it works, and then improve it (optimise/remove bugs/add features - not finish it)?

I strongly suspect--and I am not being sarcastic or falsely modest--that I am not up to scratch as far as programming is concerned. I almost always get stuff like this wrong when I implement it. My code breaks for empty lists, or with just one element, or lists with an odd number of elements, or something else I can't imagine.

Exposure to recursive algorithms has taught me to build this kind of thing up from simple cases like an empty list or a list with one element. So yes, I really would write a version that works for the empty list, a version that works for either the empty list or just one element, and go from there.


You'll love "How to Design Programs" (http://www.htdp.org/), they put a lot of emphasis on structural recursion.


Short answer: Yes, I am, including without testing.

For several years I ran a challenge for exactly this question. I have a test harness and test suite, and would run people's programs through it. Roughly 1 in 30 or 40 of those submitted without testing (self-certified) would pass all the tests. Those who tested before submission passed about 1 time in 3.

The vitriol heaped on me was quite unbelievable, and in one case led to concerns for my physical safety. People who had decided to try the challenge then rounded on me and said it was unfair, unreasonable and unrealistic. I didn't affect the results - the vast, vast majority of people simply could not write a binary search that passed my test suite.

FWIW, my first attempt had exactly one problem, to the best of my knowledge. I decided to derive my code formally from the specification. The mistake concerned computer arithmetic, not the algorithm, and was the one lunk to from the article.

But my background is unusual.


> Those who tested before submission passed about 1 time in 3

"Programmers are not to be measured by their ingenuity and their logic but by the completeness of their case analysis." --Alan Perlis


I would add - and their completeness in testing the correct implementation of those cases, and the correctness of the implementation.


'lunk' - a word is coined!



No incremental testing strikes me as a rather valueless constraint. Small experiments are one of the most crucial skills -- if not the crucial skill -- in writing correct code. It's like asking me to swim with one arm tied behind my back to see if I am a good swimmer.

Sure, I can run these 'tests' on scratch paper or even in my head, if you require me to. But real life never requires this. I prefer to prove to myself that each piece of an algorithm is correct--using a combination of reasoning and microtesting--before moving on to the next. Why would I waste attention doing this all in my head? The computer will do many pieces of it for me quickly, accurately, in a way that catches language idiosyncracies, and most importantly frees my attention for considering more abstract errors.

Edit: I had to try anyway, and (as far as I can tell) I was able to do it correctly in about three minutes. But I suspect it has less to do with native programming ability and more to do with having written dozens of variants on binary searches in the past year (I've had reason to want to handle various types of fuzzy matches, filtering and desired multiplicities in return values). FWIW, even after a fairly exhaustive suite of random input tests, I trust this binary search I have just written far less than the many I've recently written using my usual incremental methodology. Were I to include this in an actual release, I would not attempt to prove what I just wrote correct. I would destroy it and do the job properly from scratch.

Which rather underscores my point. I just wrote some code the wrong way, and even in spite of quite a lot of recent domain experience, a pretty good test suite, and a lot of staring at it and thinking about it . . . I don't trust it because I didn't test the pieces while I was writing it.


It's like asking me to swim with one arm tied behind my back to see if I am a good swimmer

I look at it more as an exercise akin to athletes running with weights. When compared with competition conditions, it is totally unrealistic, but it definitely strengthens you.


Do you want to send me your code for testing in my test suite? You've obviously tested it now, but it would still be interesting.


All right. It's in perl. Here's what I have:

  sub binary_search {
    my($ary, $val) = @_;
  
    print "Looking for $val...\n";
  
    $val < $ary->[0] and return undef;
    $val > $ary->[-1] and return undef;
  
    my $min = 0;
    my $max = @$ary - 1;
    my $mid = int(($max + $min)/2);
  
    while($max - $min > 1) {
      $ary->[$mid] == $val and return $mid;
      $ary->[$mid] >  $val and $max = $mid;
      $ary->[$mid] <  $val and $min = $mid;
  
      $mid = int(($max + $min)/2);
    }
  
    $ary->[$min] == $val and return $min;
    $ary->[$max] == $val and return $max;
  
    return undef;
  }
  
  my $ary_size = int(rand(10));
  my @ary;
  for(my $i = 0; $i < $ary_size; $i++) {
    push(@ary, int(rand(10));
  }
  
  @ary = sort @ary;
  
  print join(",", @ary), "\n";
  my $rv = binary_search(\@ary, int(rand(10)));
  print "Returned $rv\n";

I see already that it has the famous 'large array' bug. Not something I encounter in my work, so I'm not on the lookout for it. I'm curious if you'll find anything else wront with it.


Seems ok to me, liked the "($max - $min > 1)" and "$ary->[$min] ==/$ary->[$max] ==" that removed the need to add or subtract 1 inside the while, nice to see a variation once in a while :)


Can/do you write in C? Would someone care to translate this?

I'll do it tomorrow at work, but if someone can post and/or email me an equivalent C version it will reduce my time. Not a problem, but just more convenient.

You'll need to give me some way to contact you so I can send you the results.

EDIT: I've just realised the spec I usually give asks for the first occurence of a key to be returned in the case where there are duplicate keys. I'll modify my tests to account for that. It's an especially useful feature in a binary search, but rarely implemented.


Having to do it in C is even more perverse than doing it in my head.

;)


In all seriousness, if you have to translate it, don't bother. I write in C nearly daily, and I certainly wouldn't trust myself not to lose something in translation.

I'd be interested in what's in your test suite, though. I must sheepishly admit that my 'fairly exhaustive' test consisted of running the above random input until I'd seen all the usual tricky cases go by:

  - an array of length zero
  - an array of length one
  - an array of length two
  - the value is present
  - the value is absent and too small
  - the value is absent and too large
  - the value is absent and in the middle
  - the value is the first entry
  - the value is the last entry
  - the array is all one value 
  - there are multiple copies of the value you're looking for
If you've got anything more interesting in that test suite, that'd be good to know. Like I said, I represent the one-bazillionth of programmers that actually have a recurring professional interest in correctly coding binary searches.


"the array is all one value" got me. After fixing that, despite of the excessive verbosity, my experiment in literate programming passes the other tests - but I'm sure I coded a few more bugs in it - if you have some stock suite to get it through - would be very interested to know how many!

http://stdio.be/mybsearch/mybsearch.txt for the text and the http://stdio.be/mybsearch/mybsearch.c for the notangle-d C code, if you fancy to look at it.


An array of length Maxint?


Good call. As per link in http://news.ycombinator.com/item?id=1277701 (elsewhere in this page):

the binary search program that Bentley proved correct and subsequently tested in Chapter 5 of Programming Pearls contains a bug [...] it fails if the sum of low and high is greater than the maximum positive int


Mine is:

    -- find index of thing in sorted things in the range [first, last)
    function bsearch_2(things, thing, first, last)
       if first == last then return nil end
       assert(last > first)

       local midpoint = math.floor((first + last) / 2)
       assert(midpoint >= first)
       assert(midpoint < last)

       local test = things[midpoint]
       if test == thing then return midpoint end
       if test > thing 
       then return bsearch_2(things, thing, first, midpoint)
       else return bsearch_2(things, thing, midpoint+1, last)
       end
    end

    function bsearch(things, thing) 
       return bsearch_2(things, thing, 1, #things+1) 
    end
That's Lua, which uses 1-based indices. I'd be happy to hook up input-output stuff to it if you like.


Nope and can't say it ever came up in the 20 years I've been programming for myself and others.


I've written one. It was for an embedded system.

A good, well tested binary search is a pain in the ass to write...

1. Your sorted array MUST use exactly the same ordering as the search is expecting. We built and sorted the array on a PC, and ran the search on the device, so this was not guaranteed.

2. There can be NO DUPLICATE KEYS.. Duplicate keys will eventually send the search down a blind alley and off into hyperspace. Also devilishly difficult to find if the array has a lot of items, and you don't know what the problem is.

3. The above 2 problems can sneak in during production with new data.


In a well-implemented binary search duplicated keys are not a problem. You can even tune your code to return the first, the last, or a "random" one.

The point about the sort order matching the search order is more serious.

I've written all sorts of fundamental algorithms for embedded systems and I only really started to get good when I wrote the spec, wrote the proof, then derived the code. This was done informally, but a background in Pure Math let me do it without too much effort.

And it was worth it. Code developed that way tended to run first time with no surprises. After a while we increased the number of things we needed to prove about the code, and reliability was solid.

It seems the world of programming divides into several sub-classes, including (but not limited to):

+ Those who program web sites and databases

+ Those who program numerical systems and simulations

+ Those who program embedded devices and fundamental algorithms.

These are broad, and there are no doubt many more, but the skills required in "programming" seem to be dividing and become more differentiated.


Yeah, the duplicate key business came in because of other code hanging off the side of the search (a rather hairy algorithm that was searching for the next key with a different letter in a position -- given KLXX, increment the 2nd position until there's a key matching that criteria -- KLXX --> KMAA, or if there are no KM's, KN, KO, etc....


Yeah, I was just bitten the sort order one a few days ago; the key was 12octet-long unsigned integer, I combined 64bit comparison + 32bit comparison, and gotten the endian mixed up.

I used the same compare function to generate the sorted table, and also testing. So everything seemed to work fine in my unit, but once I took externally generated data...

My unit also involves b+tree and somewhat complicated merge sorting, so I first suspected they were broken and wasted several hours hunting down the wrong path. Indeed this article is good to remind me that I'm in 90% of folks.


I just did this recently to work around a bug (enhancement?) in a vendor's API.

By some coincidence, I'd looked at "Programming Pearls" and tried this exercise and got it wrong. My mistake was moving the left/right (beginning/end, whatever) to one more/less than the target location. At least I have plenty of company.


That's because you're taught recursion is bad/difficult, and you're probably operating at the bucket and pointer level of abstraction.

Binary search, and more specifically recursion over lists and trees, and the logical reasoning over with induction is one of the most primitive skills one acquires while learning Lisp.


I have to say that the usual way of writing binary search recursively doesn't help break the problem down. It's already as broken down as it gets, and you're just tweaking the single case. The loop version wins for clarity.

I think second runner-up is the recursive version where you use slices... does lisp have slices?


Loops are just special cases of recursion.

And binary search is definitely not the right operation on linked lists. So even in Lisp you should use a different data structure if you want to do a binary search.


I've written binary search for production code (Delphi RTL generic collections), and gotten it right, but more embarrassingly, I also wrote quick-sort and got it wrong. It was a mishmash of two strategies IIRC - a straight recursive quick-sort, and a half-inlined version with a loop, and it didn't work at all. Escaped into the wild owing to time crunch and me electing to do squeeze too many things in at the last moment.


If your compiler/interpreter supports tail-call-optimization you should get the half-inlined version for free.


Nothing like a only x% of you can write this to get programmers frantically posting code fragments.


NO TESTING until after you’ve decided your program is correct.

Why? What is there to be gained by asking programmers to abide by such an unnatural constraint? I would never "decide my program is correct" without having tested at least bits and pieces of it in the REPL.


Because every programmer's gut reaction to binary search is "that's easy!". The no testing rule says "orly? then show me"


every programmer's gut reaction to binary search is "that's easy!"

Perhaps I'm not a programmer then. Or I'm a strange one I guess.

I tend to approach every programming problem assuming the worst: This will be much harder than it seems. There will be corner-cases. I will miss some of them and do the wrong thing for other ones. I should try to enumerate every weird condition I can think of, and then throw random inputs at things to catch the stuff I missed.


Sounds like I'm not a programmer, either. Working on the Programming Team for my school for many years taught me to consider edge-cases.

And personally, there's quite frequently SOMETHING I forget when writing a binary search, and it usually ends in it infinite looping -- so, I often have to go find my stupid mistake.


Binary search is easy. Unless you're going to force me to do it with a Turing machine, and all names and notes must be in pig latin, and I have to do it all while gargling lemonade.

But, then, that wouldn't exactly the fault of binary search, would it?

There's a reason incremental testing is normal practice. It makes you smarter.


Because it forces one to think things through instead of using trial and error.


Isn't one of the advantages to programming over other disciplines that we can create unit tests that define "success" and iterate quickly until we get there?


Unit tests are code too. What ensures the correctness of the tests themselves?

My point is simply that you can't simply defer to testing as a cop out to avoid claiming confidence in the correctness of some portion of code. There has to be some amount of code that a dev can pretty well validate simply by inspection. (I'm not saying don't test.)


You say that as if it's a given you can only do one or the other.

FWIW, I'm a pure mathematician by training - not exactly a shortage of experience in thinking things through. But I would have very low confidence in any code I had just written down, instead of building it iteratively, using a REPL.


Maybe the wording used was a little unnatural.

None of us write code and have no clue whether it's correct or not. We should have a certain degree of confidence.

In the example given, I think it would be more realistic to ask devs to write the function (without testing) and then note how confident you are in its correctness. I have no problem with someone who is pretty sure they have bugs and can discover and fix them. But if someone is confident they have it right and they're dead wrong... now we have a problem.


No, there's a point to doing the unnatural thing of, "Keep working on it till you're sure you got it right."


Because trying it and failing you teaches you not to do that. Or, more generally, 90% of programmers failing the task tells us programmers are bad at programming without testing and we need to have solid testing in place when we code.


You do realize that testing can only show the existence of bugs, not their absence or the correctness of the program?


Nearly All Binary Searches and Mergesorts are Broken: http://googleresearch.blogspot.com/2006/06/extra-extra-read-...


> One of the reasons I’d like to see a copy of the Second Edition is to see whether this passage has changed

For the author: it hasn't changed. Still only 10%.

I actually had to write this binary search algorithm in my technical competency interview for my current job. Rather than hours, I had minutes and the extra gotcha was that it was unsorted data so I had to write quicksort first. I remember feeling excited that I was going to be working for a company that had such low level issues but eight months in it's still the only time I've written my own sort or search methods since University :)


A friend of mine once did work in C++ for the DoD. There was a logistics application that used Binary Search, which had a bug he was working on -- lots of items were somehow "misplaced" in the system. Eventually, he discovered that there was no guarantee that the list being searched was sorted!

As for writing out whole C programs and getting it right without debugging -- I once wrote a bit-banging controller for a model aircraft. We did bit-banging with a Gumstix board, which would let us use GPIO to implement 9 PWM inputs and 9 or 10 PWM outputs in software. I wrote a bit-banging loop that did all of the inputs and outputs and let programmers define control mixing functions and finite state machines.

I was unable to test it, because you can only test something like this by attaching to the hardware. Well, it worked the first time and it can run the remote control of a model aircraft in flight. (There was one tweak later, to eliminate some jitter in one of the PWM outputs, but that was just a refinement of function. It just worked.)


I wrote my version in PHP, wondering if someone could tell me what I did wrong (I didn't test it, I assume there are mistakes).

  <?php
  function bsearch($search, $array, $start, $end){
      //$start cannot be zero.
      //$search not in $array
      if($start>$end){
          return -1;
      }
      //calculate middle
      $middle = $start + ceil(($end-$start)/2);
      //check match
      if($array[$middle]==$search){
          return $middle;
      }
      //is the middle higher than the search, if so, cut top
      else if($array[$middle]>$search){
          return bsearch($search, $array, $start, $middle-1);
      }
      //otherwise cut bottom
      else{
          return bsearch($search, $array, $middle+1, $end);
      }
  }

  ?>


Yup, no sigar.

1. You get a division by zero when $start == $end

2. When the element $search doesn't exist your program breaks

3. You always check array for value $middle, so $middle should have the range 0 <= $middle < length($array). But according to the comment $start cannot be zero. If so, then $middle is always larger than zero, and as such, the first element of the array is never considered.

So it's pretty buggy.


1. i knew there was issue with start==end. what is the solution for that issue? i had originally written $start<=$end

2. i think this is connected to previous issue, with ending it

3. I knew the zero thing would be an issue, but... had to write it before caps game started :(


Yay, I did it in 20 minutes and even remembered to test for the empty array :-)

It was kind of stressful to have to pass 13 testcases on the very first attempt. I rigged them to just shout "YOU FAIL" at me without telling the specific place. Reread the code for one last time and ran it. After it completed without printing anything, I stared at the screen for like ten seconds, then went and changed one testcase to deliberately fail in case I'd screwed up the printing. Nope - everything went better than expected :-)

But Raganwald is right, this exercise doesn't really tell anything about my programming skill. I mean, I don't work like that at all. My first attempts are always broken. Iterating is faster (or maybe only feels faster) than trying to get the first attempt right.


I remember reading the linked post on the Google blog. Ah, here it is: http://news.ycombinator.com/item?id=1130463 (Wow, that was only 60 days ago?)

Maybe that's why when I did this in C just now, my version ended up almost identical to the Java version. (Mine does the more correct "int middle = low + (high - low) / 2;", and I wasn't even thinking about the possibility of integer overflow the other way.) So yes, but only from benefit of having to have written it numerous times in school and in teaching newbies to use it.


I started writing a solution and then stopped partway through because the pressure of getting it right in one shot was incredibly stressful.

If this were given to me for an interview question, I'd be incredibly meticulous and build a series of algorithms for len=0, len=1, len=2, etc., including various permutations of values, until I establish a general solution, which I would then submit for the final answer.

I would hardly ever do that in actual coding, though, because I could rely on hammering it with tests instead.


Side-issue: why isn't Golden section better known ?

Luenberger's book [1] convinced me that a line search should be done with golden section, when I was hacking non-linear optimization. Yet I rarely see it mentioned - seems always to come down to binary search ...

[1] http://www.amazon.com/Linear-Nonlinear-Programming-David-Lue...


Binary means division into two. Dividing in unequal parts is still a binary search.

And Golden Section is only indicated under some circumstances. It's big-O the same, but not little-O the same, and it depends in the exact details. It's also hard to get those details right.


Thanks, good point on the binary ...


Binary search is for finding x so that f(x)=y, where y is given.

Ternary search and Golden section are for finding a local optimum at x, where f(x) isn't given.


This is interesting, a somewhat related post can be found here:

http://googleresearch.blogspot.com/2006/06/extra-extra-read-...

The gist of this is that version of BSearch implemented in the JDK contained a bug due to an overflow error. I recall reading about it some time ago, funny to see it come up again.


I'm amazed how many interviewees can't even properly explain binary search, down to saying "linked list" as the base data structure.


> "linked list" as the base data structure.

I would enjoy reading a logically correct implementation of a binary search on a linked list. Discussing the performance complexity would be an obvious next question in the interview.

http://en.wikipedia.org/wiki/Schlemiel_the_Painters_algorith...



Well, a skip list is just a bunch of linked lists...


I would describe it more as a multiply-linked list. The important distinction in a skip list is that you get random access, in a different sense of the term, and so binary search makes sense. Similarly, you could justify a binary (or another non-sequential) search on a traditional linked list by assuming huge evaluation costs.


Here's a link to some of his work. A big fat hairy perl file:

http://cpansearch.perl.org/src/MIRK/Net-Z3950-Simple2ZOOM-1....

Knowing how to write code that is easy to read is just as important, if not more so, than knowing how to write binary searches.


How is that relevant? It's essentially equivalent to an ad hominem. Just because the guy wrote bad code doesn't mean he might not have a good point.


It is not an ad hominem because he was not arguing the point was invalid, rather that the guy is foolish to such a statement while not advocating code structure and style. Notice "...is just as important..."

I agree - that file he linked is very difficult to read and would make it much harder for me to find problems.


I don't understand in what way you think the Simple2ZOOM code is deficient. Is it just because you prefer it to be broken into many small files?


Maybe I was fortunate; I learned how to write a binary search in my lower-level CS courses. I didn't even find it particularly difficult, but then again, the concept was explained quite clearly before I had to code it.


The trick with this binary search test isn't knowing or being taught or learning the algorithm, or finding it difficult; it's making sure you find and deal with every corner case, first time, without testing. For example: is an empty array handled properly? An array with duplicates? Is every item, including first and last, found correctly? What about arrays with an even number of elements vs odd, vs power of two? How about an array with over 1 billion elements (the integer overflow problem)? Are values before and after the sorted range indicated properly? If working with duplicates, do you want a found index at the start of the range, or at the end?

You get the picture.


So what this is really all about is this- do you mentally step through your code and consider what could go wrong at each step?


I would rather think in invariants - knowing what must be true at particular points in the code, at the start of loop bodies, etc. Essentially, doing a proof in one's head.


That's the point - nobody finds it difficult. And then they write buggy code without realizing it. You probably did, as well.


Some people here think this is about coding without testing. No, it's an example of thinking through before testing, about the kind of thinking that's separate from testing. It's not meant as an example of what all programming is about, but a probe of an aspect of programming. So why.

Is it okay to write down a guess, then look at it with edge cases in mind, fix and iterate? I think sketching is okay for this exercise as long as you follow through and finish.

The difference is that sketching doesn't tell you what the bugs are. Whereas testing can tell you specifically. One's an aid to thought and the other's... an alternative, to be neutral about it.

A couple people have described case analysis like simulating tests in your head. I hope they actually use more principled thinking than they're letting on.

Iterative coding breaks into two cases: either the task is subjective/aesthetic, like an interaction or layout problem, or it really is an exact task but you don't have it clear in your mind. Everybody recognizes the value of iterating on a fuzzy problem, put that aside. The question is, if it's an exact problem but you don't understand it well, then will test-writing clarify or obscure things?

Sometimes I iteratively write and run or test a program until I get to a point where I say, "Okay, now do I understand what this is about? Can I explain it thoroughly in full sentences?" Even so I'm sure I've fooled myself with tests.

In real world coding, you aren't given small problems. That doesn't mean small problems are poor examples. Obviously we should break big problems into as small and comprehensible problems as possible. Testing doesn't do that (although it could make incentives).

Not to mention the fact that there's no such thing as a complete test for a program with an infinite set of possible inputs. You can test for what seem to be the kinds of bugs the program could have.

The unstated assumption of many here is that somehow in the process of testing we'll come upon all the right tests for that program, without any sneaky missed cases. Then you will look at the tests and code and somehow extrapolate or judge that you've covered all the bases. But how sure is this if you still aren't sure exactly what the problem is, or if you don't think about bugs and cases very well?

There's a kind of thinking you have to do within and around testing anyway. The write- before- testing challenge tests how well you do that necessary thinking, and how reliably you judge how well you've done. It's an artificial light on a real issue.

Btw, my test file, meant to be easily processed in any language: http://www.mac-guyver.com/switham/2010/04/Binary_Search




Applications are open for YC Winter 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: