Twenty questions, in the degenerate case when 20 = lg(n)
A new post at Computational Complexity asks the following question:
(Problem 1.) You are trying to determine a secret integer x between 1 and n inclusive, using only queries of the form “is x = a (mod b)?” for integers a and b of your choosing. How many queries are required to determine x in the worst case?
And more restrictively,
(Problem 2.) Same as above but you are only allowed to make queries where b is prime.
The comments section seems to reveal at least a partial solution, but these are fun problems, so at least give them a shot before reading the solution there (or here.) If you’d like the solution spoiled anyway, read on!
First let’s talk about a problem that has a more intuitive solution. Namely, the same problem of finding an integer between 1 and n, but where we are allowed to ask questions of the form “is x ≤ a?” for our choice of integers a.
Computer scientists in the audience should recognize the solution: first, find out about the midpoint. That is, whether x ≤ n/2. Whatever the answer is, we split the search space in half! That is, we originally knew that x must be one of n values. After the first query, we know that x must be one of n/2 values: either 1 ≤ x ≤ n/2 or n/2 < x ≤ n. Either way, we can simply repeat the process again! After the second query, we know it must be one of n/4 values. And so on. For example, for 1 ≤ x ≤ 8, the following sequence of questions would reveal that the answer is x = 3.

It should be easy to convince yourself of the fact that you can only make these divisions lg(n) (that’s log-base-2*) times before you determine x.
Furthermore, this is the best we can do: any algorithm that finds x using only true/false queries requires at least lg(n) queries in the worst case. Why? Essentially, because this is how many bits there are in the binary expansion of x. Take n = 8 for example. If you could determine any 1 ≤ x ≤ 8 using only two queries in the worst case, then any such x could be written uniquely using only 2 bits, which is impossible.
But let’s go back to Problem 1. Notice that the above argument applies for any algorithm that may only make queries about x whose answers are boolean. So we may immediately say that at least lg(n) queries are required by any algorithm that solves it.
However, is the answer again exactly lg(n)? Our binary search problem had the “nice” property that no matter what the answer to our previous query was, we could always construct another query that cut the search space in half again. Any boolean query may be thought of as a filter that eliminates a certain portion of the search space (which here is simply the set {1, 2, …, n}.) But not all queries may have this nice property. For example, if we were only allowed to ask questions of the form “is x = a?” then we would need to ask n – 1 questions in the worst case—because no matter in what order we guess numbers, our first n – 1 guesses could be wrong.
It turns out that the queries in Problem 1 do indeed have this nice property. The algorithm works like so:
- Ask whether x = 0 (mod 2). Doing so cuts the search space in half.
- If the answer above was Yes, then ask whether x = 0 (mod 4). Since the previous answer was Yes, it must be the case that either x = 0 (mod 4) or x = 2 (mod 4), and the answer to this query again cuts the search space in half.
- If the answer above was No, then ask whether x = 1 (mod 4). From the same reasoning as above, we again cut the search space in half.
- Repeat the process using mod 8, 16, 32, etc., crafting the appropriate query.
It shouldn’t be too hard to find the correct question to ask in each case. As an example,

This process can always be completed in lg(n) queries, which is the answer to Problem 1.
Note, however, that we always use a modulus of the form
, so this solution doesn’t help us solve Problem 2, where the modulus must always be prime! Don’t panic; I will discuss a solution to this problem in my next blog post.
* Here I’m assuming that n is a power of 2 to make the formula simpler, but if it isn’t then the correct answer is simply the ceiling of lg(n). That is, round lg(n) up. In fact, whenever I write lg(n) you can assume this is what I mean.