|
Binomial sum variance inequality
Consider the sum, $Z$, of two independent binomial random variables, $X ∼ B(m_0, p_0)$ and $Y ∼ B(m_1, p_1)$, where $Z = X + Y$. Then, the variance of $Z$ is less than or equal to its variance under the assumption that $p_0 = p_1$, that is, if $Z$ had a binomial distribution. Symbolically, $\operatorname{Var}(Z) \leqslant \operatorname{E}[Z] (1 - \frac{\operatorname{E}[Z]}{m_0+m_1})$.
Proof.
We wish to prove that
$$\operatorname{Var}(Z) \leqslant \operatorname{E}[Z] (1 - \frac{\operatorname{E}[Z]}{m_0+m_1})$$
We will prove this inequality by finding an expression for $\operatorname{Var}(Z)$ and substituting it on the left-hand side, then showing that the inequality always holds.
If $Z$ has a binomial distribution with parameters $n$ and $p$, then the expected value of $Z$ is given by $\operatorname{E}[Z] = np$ and the variance of $Z$ is given by $\operatorname{Var}[Z] = np(1 – p)$. Letting $n = m_0+ m_1$ and substituting $\operatorname{E}[Z]$ for $np$ gives
$$\operatorname{Var}(Z) = \operatorname{E}[Z] (1 - \frac{\operatorname{E}[Z]}{m_0+m_1})$$
The random variables $X$ and $Y$ are independent, so the variance of the sum is equal to the sum of the variance, that is
$$\operatorname{Var}(Z) = \operatorname{E}[X] (1-\frac{\operatorname{E}[X]}{m_0}) + \operatorname{E}[Y] (1-\frac{\operatorname{E}[Y]}{m_1})$$
In order to prove the theorem, it is therefore sufficient to prove that
$$\operatorname{E}[X](1 - \frac{\operatorname{E}[X]}{m_0}) + \operatorname{E}[Y](1 - \frac{\operatorname{E}[Y]}{m_1}) \leqslant \operatorname{E}[Z](1 - \frac{\operatorname{E}[Z]}{m_0+m_1})$$
Substituting $\operatorname{E}[X] + \operatorname{E}[Y]$ for $\operatorname{E}[Z]$ gives
$$\operatorname{E}[X](1 - \frac{\operatorname{E}[X]}{m_0}) + \operatorname{E}[Y](1 - \frac{\operatorname{E}[Y]}{m_1}) \leqslant (\operatorname{E}[X]+\operatorname{E}[Y])(1 - \frac{\operatorname{E}[X]+\operatorname{E}[Y]}{m_0+m_1})$$
Multiplying out the brackets and subtracting $\operatorname{E}[X] + \operatorname{E}[Y]$ from both sides yields
$$- \frac{\operatorname{E}[X]^2}{m_0} - \frac{\operatorname{E}[Y]^2}{m_1} \leqslant - \frac{(\operatorname{E}[X]+\operatorname{E}[Y])^2}{m_0+m_1}$$
Multiplying out the brackets yields
$$\operatorname{E}[X] - \frac{\operatorname{E}[X]^2}{m_0} + \operatorname{E}[Y] - \frac{\operatorname{E}[Y]^2}{m_1} \leqslant \operatorname{E}[X] + \operatorname{E}[Y] - \frac{(\operatorname{E}[X]+\operatorname{E}[Y])^2}{m_0+m_1}$$
Subtracting $\operatorname{E}[X]$ and $\operatorname{E}[Y]$ from both sides and reversing the inequality gives
$$\frac{\operatorname{E}[X]^2}{m_0} + \frac{\operatorname{E}[Y]^2}{m_1} \geqslant \frac{(\operatorname{E}[X]+\operatorname{E}[Y])^2}{m_0+m_1}$$
Expanding the right-hand side gives
$$\frac{\operatorname{E}[X]^2}{m_0} + \frac{\operatorname{E}[Y]^2}{m_1} \geqslant \frac{\operatorname{E}[X]^2+2\operatorname{E}[X]\operatorname{E}[Y]+\operatorname{E}[Y]^2}{m_0+m_1}$$
Multiplying by $m_0 m_1 (m_0+m_1)$ yields
$$(m_0m_1+{m_1}^2){\operatorname{E}[X]^2}+ ({m_0}^2+m_0m_1){\operatorname{E}[Y]^2} \geqslant m_0m_1({\operatorname{E}[X]}^2+2\operatorname{E}[X]\operatorname{E}[Y]+{\operatorname{E}[Y]]^2})$$
Deducting the right-hand side gives the relation
$${m_1}^2{\operatorname{E}[X]^2} -2m_0m_1\operatorname{E}[X]\operatorname{E}[Y] + {m_0}^2{\operatorname{E}[Y]^2} \geqslant 0$$
or equivalently
$$(m_1\operatorname{E}[X] - m_0\operatorname{E}[Y])^2 \geqslant 0$$
The square of a real number is always greater than or equal to zero, so this is true for all independent binomial distributions that $X$ and $Y$ could take. This is sufficient to prove the theorem. |
|