This post is dedicated to the “secretary problem,” where one wishes to hire a secretary. For some reason the process is as follows: you are interviewing \(n\) candidates, in a random order. You may hire the \(t\)’th candidate if and only if you make them an offer after their interview and before the next candidate’s interview. You consider your hiring process a success if you hire the strongest candidate in the pool.
You settle on the following strategy: for a certain value \(T(n)\), you interview the first \(T(n)\) candidates but do not make any of them an offer. You then offer the job to the first subsequent candidate (if any) who is stronger than all the first \(T(n)\) candidates.
The goal of this post is to show that this seemingly ridiculous strategy is actually widely applicable in various online settings, and that the strategy can actually guarantee an unexpectedly large success probability - though critics would describe it as 3.6 röntgen (not great, not terrible). Click here to see the reference to that statement.
This setting, albeit strange, is a typical example of an online problem - where you're given data in a stream, and you need to do some space-efficient computations (without saving the data). In the trivially generalized version of this problem, you're given \(n\) reward-functions, but one at a time, and are asked to stop when you open the reward-function that you're most happy with. If you form your expectation of what you're happy with too early, then you might lose out on a greater value which could occur later in the game. Similarly, if you explore too much, you might discard your best options early on. Therefore, this problem (in some ways) gets to the classical problem of exploration vs. exploitation.
This has obvious applications in reinforcement learning for machine learning, when you consider each candidate as a reward function. It can also be used in various consumer-focused settings - for instance, determining which is the best product to consume after sampling prior products. There are many other industrial and online-algorithmic applications for this problem too. It really is quite generalizable!
We first operate under the assumption that all candidates can be given distinct values pertaining to their skills, and that there is a strict ordering of these values. Let \(A\) be the event that the best candidate is chosen. Then, let \(A_i\) be the event that candidate \(i\) is chosen, and let \(B_i\) be the event that candidate \(i\) is the best candidate for the role. Then, we can write the probability of \(A\) conditioned on the event of each candidate being chosen by conditioning on the \(B_i\)'s:
\begin{align*}
\Pr(A) &= \sum_{i=1}^N \Pr(A_i \cap B_i) = \sum_{i=1}^N \Pr(A_i | B_i) \Pr(B_i) = \frac{1}{N} \sum_{i=1}^N \Pr(A_i | B_i)
\end{align*}
The above equality follows since \(\Pr(B_i)\) for any \(i\in[N]\) is equal to \(\frac{1}{N}\) because there are \(N\) possible candidates, and each of them could be the best possible candidate with equal probability in our streaming algorithm. Additionally, note that \(\Pr(A_i|B_i)\) for \(i=1\) to \(T(n)\) is \(0\), since we are actively rejecting these candidates so the probability that we will choose them is \(0\). Thus, we write that:
\begin{align*}
\Pr(A) = \frac{1}{N} \sum_{i=1}^N \Pr(A_i | B_i) &= \frac{1}{N} \sum_{i=T(n)+1}^{N} \Pr(A_i | B_i)
\end{align*}
Then, \(\Pr(A_i|B_i)\), the probability of choosing candidate \(i\) given that candidate \(i\) is the best, is \(\frac{T(N)}{i-1}\) since we need to ensure that the second best candidate is within the first \(T(N)\) candidates to be interviewed, and there are \(i-1\) possible positions for the second best candidate to be chosen. This is precisely the condition that allows the best candidate to be chosen. So, this gives us:
\begin{align*}
\Pr(A) &= \frac{1}{N} \sum_{i=T(N)+1}^{N} \frac{T(N)}{i-1} = \frac{T(N)}{N} \sum_{i=T(N)+1}^{N} \frac{1}{i-1} = \frac{T(N)}{N} \bigg(\frac{1}{T(N)} + ... + \frac{1}{N-1}\bigg)
\end{align*}
Then, let \(H(X) = \frac{1}{1} + \frac{1}{2} + \cdots + \frac{1}{X}\) be the \(X\)'th harmonic number. Then,
\begin{align*}
\frac{T(N)}{N} \bigg(\frac{1}{T(N)} + ... + \frac{1}{N-1}\bigg) &= \frac{T(N)}{N} \bigg(H(N-1)- H(T(N)-1))\bigg)
\end{align*}
We then note that we can bound this summation since it is the right-Riemmanian sum of its corresponding integral:
\begin{align*}
H(N-1)-H(T(N)-1) &\geq \int_{T(N)-1}^{N-1} \frac{1}{1+x} dx \\
&= \ln(1+x)|_{T(N)-1}^{N-1} \\
&= \ln(N)- \ln(T(N)) \\
&= \ln\left(\frac{N}{T(N)}\right)
\end{align*}
Thus, we get that
\begin{align*}
\Pr(A) &= \frac{T(N)}{N} \bigg(H(N-1)- H(T(N)-1))\bigg) \geq \frac{T(N)}{N} \ln\left(\frac{N}{T(N)}\right)
\end{align*}
Let \(x=\frac{N}{T(N)}\). Then, we have that \(\Pr(A) \geq \frac{1}{x}\ln(x)\). Differentiating both sides yields:
\begin{align*}
-\frac{1}{x^2} \ln(x) + \frac{1}{x^2} &\geq 0 \implies \ln(x) \leq 1 \implies x\leq e
\end{align*}
The above is justified since \(T(n)\geq 1\) (we know that \(T(n)\neq 0\) since there would be nothing to compare it to). So,
\begin{align*}
x\leq e\implies \frac{N}{T(N)} \leq e \implies T(N) \geq \frac{N}{e}
\end{align*}
Then, substituting this value of \(T(N)\) into our probability expression gives us that \(\Pr(A) \geq \frac{T(N)}{N}\ln(\frac{N}{T(N)}) \geq \frac{1}{e} \ln(e) = \frac{1}{e}\). Hence, our choice of \(T(n) = \frac{n}{e}\) yields a maximal success probability of \(1/e \approx 0.367879\). Therefore, this algorithm actually performs somewhat effectively given the hard constraint of this online setting.