Analysis of a Binary Search Function
So how did you do? Did you notice that it doesn't compile? My friend put his function declarations out of order -- oops! Within minutes, he received this email:
By Alex Allain
To: Joe From: Me Subject: Re: Untested Code Sorry Joe, it doesn't compile!Not wanting to win on what I still considered a technicality (I did offer him the chance to compile it!) I fixed his first bug and began a little bit of black box testing. (I had glanced over his search code looking for bugs, but nothing horrible stood out at me. In fact, it worked for the first few tests -- a couple of softballs, such as giving the list "1 2 3 4 5" and searching for the number 3 -- or looking for elements that didn't exist.
Then I decided to throw something a little bit trickier at his code: multiple copies of the same input: "4 4 4 4" and have it search for an element that wasn't in the list: 3. Success! A stack overflow.
Now, what did he do wrong? I spent a little bit of time testing other combinations of input, but his code wasn't terribly predictable, and testing without looking at the source code was getting a bit tedious. I went back, and started to check his recursion.
Base case? The base cases looked OK. I noticed that he had one extra base case: "if(start==end-1) ...", but it wouldn't cause a stack overflow. So if the termination conditions are correct, what could possibly be wrong?
There had to be a problem with the recursive calls -- they must not be making the problem simpler every time. The most obvious thing to check is whether he's correctly computing the new subarray. Is the current midpoint being included in the subarray? Nope, it isn't. He got that right. In fact, his extraneous base case would have protected him from any problems with that (exercise for the reader: why). This is a subtle bug, but not one that tripped him up.
So what's wrong? If you haven't found it yet, go back and spend some time going very carefully over his code, paying particular attention to his variable names and function parameters.
It turns out that he passes in the wrong value to the recursive calls. Instead of passing in the correct bounds for the array (either the midpoint + 1 or the midpoint - 1), midInd, he passes midVal. Amazingly, on some test cases this works -- including simple inputs such as "0 1 2 3 4". I'd probably have caught this earlier if some of my test cases had included larger values stored in the array -- when testing, try extremes even when it "shouldn't matter". It often does.
Finally, the third bug is that he never checks for size 0 arrays. Allocating 0 bytes of memory:
int *x = new int;will return a valid pointer, but not one that you are allowed to dereference. Testing for this would have been easy in the first binary search helper function.
Taking Stock of the Situation
Do all these mistakes in such a "simple" program mean my friend is a bad programmer? Not at all. The first bug would have been caught easily once he compiled the code. The second bug was so easy to find precisely because my friend is a good programmer and used good variable names. If he hadn't, I would have spent a lot longer scratching my head trying to figure out what was going wrong. Instead, because the use of the variable didn't correspond to the name of the variable, I was able to pinpoint his bug almost immediately once I knew where to look.
The third bug, not handling zero element arrays, would have been easy to find by inspection once you start to consider corner cases. This is perhaps the only bug I can really fault him for -- it's non-trivial, since you shouldn't dereference a pointer returned from new when the size is 0 (it's a "unique pointer" with size 0, and safe to pass to delete, but not safe to dereference).
What to Take Away
How did I know that my friend would have so much trouble with such a simple program? First, perfect programming is hard -- even the best programmers make mistakes. Second, my experiences have reinforced the old adage that a line of code that you haven't tested is a line of code with a bug. But I had one other secret: experimental evidence.
In Jon Bentley's classic work, Programming Pearls, he gives the task of implementing a correct binary search as an apparently simple, but surprisingly difficult challenge. He mentions that of the large number of professional programmers whom he's challenged to write a bug-free binary search, fewer than 50% get it right on the first try (in fact, it's much closer to 10%).
Perhaps more surprising still is that, according to Donald Knuth in The Art of Computer Programming, it was over ten years after the first publication of a binary search algorithm that fully general solution was published. In programming, nothing is quite as simple as it seems (try this article on leaky abstractions for another take).
Would I use my friend's code with the modifications I have suggested? Maybe, but I'd be tempted to look up valid binary search code from a more rigorously vetted reference, used and checked by more than one programmer.
I leave you with a quote from Knuth, describing some code he had written in a memo: "Beware of bugs in the above code; I have only proved it correct, not tried it."
Oh -- and yes, Joe did take me out to lunch. I have a lovely veggie burger. He just had an appetizer...knowing now that even a simple program is a lot to chew on.
Learn more about search techniques
Learn more about the various sorting algorithms