|
PDF
2 Basic Techniques
Proposition 1 (Union Bound). Given events $A_{1}, A_{2}, \ldots, A_{n}$, we have
$$
P\left(A_{1} \cup A_{2} \cup \cdots\cup A_{n}\right) \leq P\left(A_{1}\right)+P\left(A_{2}\right)+\cdots+P\left(A_{n}\right)
$$
From Union Bound, we see that if the sum of the probabilities $P\left(A_{1}\right), \ldots$, $P\left(A_{n}\right)$ is less than 1 , then the probability that none of the events $A_{1}, \ldots, A_{n}$ occur is positive.
Example 3. (Hong Kong) In each cell of a $100 \times 100$ matrix $A=\left\{a_{i j}\right\}$, one of the integers $1,2, \ldots, 5000$ is written. Moreover, each integer appears in the matrix exactly twice. Prove that one can choose 100 cells in the matrix satisfying the three conditions below:
(a) Exactly one cell is chosen in each row.
(b) Exactly one cell is chosen in each column.
(c) The numbers in the cells chosen are pairwise distinct.
Proof. Let $\left\{b_{1}, b_{2}, \ldots, b_{100}\right\}$ be a random permutation of $\{1,2, \ldots, 100\}$. Let's consider the 100 cells
$$
B=\left\{a_{1, b_{1}}, \quad a_{2, b_{2}}, \quad \ldots, \quad a_{100, b_{100}}\right\} .
$$
Clearly $B$ satisfies both (a) and (b). We compute the probability that $B$ does not satisfy (c). Let's consider any $n \in\{1,2, \ldots, 5000\}$. Two of the cells, say $a_{i j}$ and $a_{p q}$, in matrix $A$ have the number $n$ written in. If $i=p$ or $j=q$, then the probability that there exist two cells from $B$ with $n$ written in is 0 because none of the cells in $B$ are from the same row or column. Otherwise, we consider $a_{i, b_{i}}$ and $a_{p, b_{p}}$. The probability that $a_{i, b_{i}}=a_{p, b_{p}}=n$ is $\frac{1}{100} \cdot \frac{1}{99}$.
Since there are 5000 numbers, Union Bound tells us that the probability that any two of the numbers in $B$ are the same is no more than
$$
5000 \cdot \frac{1}{100} \cdot \frac{1}{99}<1 .
$$
Thus, for any random sampling of 100 cells satisfying (a) and (b), the probability that those cells are pairwise distinct is positive. Therefore, there exist 100 cells satisfying the three conditions (a), (b), and (c).$\square$
Proposition 2. If a random variable $X$ has expectation $E(X)=c$, then there must be a realization of $X$ with value at least $c$, and a realization of $X$ with value at most c.
For example, if the scores of a test are integers from 0 to 10 and the class average is $7.3$, then we know for sure that at least one student's score is 8 or higher, and at least one student's score is 7 or below.
Example 4. (Moscow 1993) In a botanical classifier, a plant is identified by 100 features. Each feature can either be present or absent. A classifier is considered to be good if any two plants have less than half of their features in common. Prove that a good classifier cannot describe more than 50 plants.
Proof. Let's pick a pair of plants uniformly at random from $n$ plants. Let $X_{i}=1$ if the two plants in the chosen pair have opposite $i$ th features and $X_{i}=0$ otherwise. Then let $X$ be the total number of features for which the plants differ. So $X=X_{1}+X_{2}+\cdots+X_{100}$. Thus, $E(X)=\sum_{i=1}^{100} E\left(X_{i}\right)=\sum_{i=1}^{100} P$ (the chosen pair has opposite $i$th features).
For $i \in\{1,2, \ldots, 100\}$, let $n_{i}$ be the number of plants with $i$ th feature present. Then the number of pairs of plants that have opposite $i$ th features is $n_{i}(n-$ $\left.n_{i}\right)$. Therefore,
$$
E(X)=\sum_{i=1}^{100} \frac{n_{i}\left(n-n_{i}\right)}{\left(\begin{array}{l}
n \\
2
\end{array}\right)}
$$
By AM-GM, $n_{i}\left(n-n_{i}\right) \leq \frac{n^{2}}{4}$. So
$$
E(X)=\sum_{i=1}^{100} \frac{n_{i}\left(n-n_{i}\right)}{\frac{n(n-1)}{2}} \leq \sum_{i=1}^{100} \frac{n^{2}}{2 n(n-1)}=100 \cdot \frac{n}{2(n-1)}=\frac{50 n}{n-1}
$$
When $n=51$, we get $E(X) \leq 51$. Note that $n_{i}\left(n-n_{i}\right)=\frac{n^{2}}{4}$ if and only if $n_{i}=\frac{n}{2}$. Since $n_{i}$ is an integer, the equality cannot hold when $n=51$. That is, when $n=51, E(X)<51$. Since $X$ is an integer, when there are 51 plants, there must exist a pair of plants such that the number of opposite features this pair has is $\leq 50$. In other words, there exists a pair of plants that have 50 or more common features. Therefore, a good classifier cannot describe 51 plants. Consequently, it cannot describe more than 50 plants.$\square$ |
|