Note: I’m giving the code samples in this post in almost-executable ruby code. I hope it is understandable even for non-rubyists, but please let me know in the comments what you think.

I’m guessing you’ve heard of binary search (or you can check out Binary Search on Wikipedia). What you might not have heard about is a slightly different and more general formulation that leads to surprisingly elegant solutions for some problems.

Slightly more generic

All one really needs for binary search is a function f(index) and bounds i_min, i_max. The function returns an indication of whether the index is too low (-1), too high (+1) or just right (0). We also need to guarantee that the function is monotonically increasing – if a <= b then f(a) <= f(b). Having those we can find the smallest value of index where f(index) is equal to 0 using the following well-known procedure:

def binary_search(f, i_min, i_max)
  while (i_max - i_min).abs > 1
    mid = (i_max + i_min) / 2

    if f.call(mid) <= 0
      i_min = mid
    else
      i_max = mid
    end
  end

  i_min if f.call(i_min) == 0
end

How do I use that?

Well that was dry. The important thing here is that we are finding what we were looking for using O(ln(i_max - i_min)) steps, because the search space halves with each step and that we are calling the function f exactly once for each step. We can check that this works as a regular binary search by providing a specially crafted f that performs lookups in a sorted array:

fruit = [:apple, :banana, :orange, :pear]
binary_search(->(i) { fruit[i] <=> :banana }, 0, fruit.count)
# => 1

Note that <=> is ruby’s spaceship operator that compares two things and returns -1 if the first one is smaller, 1 if the second one is smaller and 0 if they are equivalent – exactly what we need.

So far so good, but why the additional complexity of providing a function instead of an array? Well, it turns out that there is a whole class of problems where an elegant solution can be obtained by a sort of “shooting” method. You shoot, write a procedure that checks if your shot went high or low and then adjust, based on that. If you couple that with binary search you can find the target efficiently – you’ll be only adding a factor of O(ln(N)) to the complexity of checking if a single shot went high or low.

Applications

Let’s take the following problem: we are given an array A of N positive numbers. Our task is to find a suffix B of that array with the property that the sum over i of B[i] * i is equal a given number S. It’s easy to see that given a certain suffix A we can check if it satisfies that property in at most N steps. Now we can use our binary search code to find the proper suffix!

binary_search(
  ->(i) { S <=> A[i..-1].each_with_index.map { |j, x| j * x }.inject(:+) },
  0, A.count
)

Since checking a single solution requires N steps we only need O(N * ln(N)) steps to complete this computation. Importantly the longer the suffix the larger the resulting sum so the required monotonicity property holds.

For one more example we can imagine an API where you can request all records since a given date, but it’s a slow operation if it turns out that there are many records. If there is also an API that can provide us with the count of records since a given date in a fast manner we can use binary search to get the last 10 records:

binary_search(
  ->(i) { count_records(Time.at(i)) <=> 10 },
  0, Time.now.to_i
)

This code will call the counting API (count_records) at most O(ln(Time.now.to_i)) times – since Time.now.to_i is the Unix timestamp which at the time of this writing is 1430738848 that’s 30-31 calls to the counting API at most. That might seem like a lot, but might still be much faster than requesting all records since the beginning of time only to find that you’re trying to transfer hundreds of megabytes of data.

What these two examples have in common is that you don’t need to generate the array that you will be searching beforehand. If you have a way to compute the elements of that array on-demand it will be enough to use binary search on it. Or put another way: if you have a way of checking if a specific solution is “too big” or “too small” in linear time you can find the right solution in log-linear time using this method.

Posted by

Share this article

  • Very interesting! I hadn’t thought about using a binary search to optimise API calls before. Clever. :)

    Another place that binary search is really useful is `git bisect` for finding a commit where a bug first started occuring.

    Also, fun fact—there’s one subtle bug that shows up in nearly all binary search implementations:

    mid = (i_max + i_min) / 2

    With the above, it’s possible to start hitting integer overflow issues on the addition on larger datasets.
    You can re-arrange it a little bit to avoid the problem:

    mid = ((i_max – i_min) / 2) + i_min

    • Paweł Obrok

      Good point! That will certainly matter for languages where integer overflow is a thing. I tend to forget about such things as a Rubyist/Elixirist, where arbitrary-precision integers are the default.