A tiny remark about the Cauchy-Schwarz inequality
The Cauchy-Schwarz inequality is not hard to prove, so
there is not much reason for a page devoted to simplifying
the usual proof, or rather simplifying the usual presentation
of the usual proof. What is more, the idea that follows is
so natural that it must be well known to a significant
proportion of mathematicians. Hence the word tiny above.
Nevertheless, most textbooks and all analysis courses I
have attended favour the approach where you write down a
quadratic form, use the fact that it is non-negative everywhere,
and observe that this implies the Cauchy-Schwarz inequality. No
explanation is usually given of where the quadratic form comes
from. This page is intended for those who happen not to have
observed, or been shown, that more or less the same argument
can be made to seem much more natural. Indeed, this is another
example of a proof that a well-programmed computer could
reasonably be expected to discover.
First, let us consider the basic, real-analysis version
of the inequality, namely
$$a_{1}b_{1}+...+a_{n}b_{n}⩽
(a_{1}^{2}+...+a_{n}^{2})
^{1/2}(b_{1}^{2}+...+b_{n}^{2})
^{1/2}$$
with equality if and only if the sequences $(a_{i})$ and
$(b_{i})$ are proportional.
How might one go about proving this statement using no tricks?
One idea is to try to find a natural way to express the fact that
two sequences are proportional. Of course, we could say something
like there exists a constant $λ$ such that $a_{i}=\lambda b_{i}$ for every $i$ , but this introduces an unknown
constant $λ$, and it will make our proof harder later on if we
have to find this $λ$.
This is not a serious problem though, as we can identify $λ$
as something like $a_{1}/b_{1}$. And if we dislike the
lack of symmetry involved in choosing $a_{1}/b_{1}$
rather than some other $a_{i}/b_{i}$, we could
simply say that all the $a_{i}/b_{i}$ are equal.
This still leaves a minor problem that some of the $b_{i}$
may be zero, and the related minor problem that we are not dealing
with the $a_{i}$ and $b_{i}$ symmetrically. To get
round these small difficulties, let us define $(a_{i})$ and
$(b_{i})$ to be proportional if $a_{i}b_{j}=a_{j}b_{i}$ for every pair $i,j$.
Now we would like to express this fact analytically ,
and for this there is a very standard idea. If you want lots of
real numbers to be zero then you can achieve this by insisting that
the sum of their squares is zero. In this case we want all the numbers
$a_{i}b_{j}-a_{j}b_{i}$ to be zero,
so the sequences $(a_{i})$ and $(b_{i})$ are proportional
if and only if
$$ ∑_{i,j}
(a_{i}b_{j}-a_{j}b_{i})^{2}=0
$$
and the expression on the left is trivially at least zero.
Expanding out the bracket on the left hand side we get
$$∑_{i,j}(a_{i}^{2}b_{j}
^{2}
+a_{j}^{2}b_{i}^{2}
-2a_{i}b_{j}a_{j}b_{i}) $$
which equals
$$2(∑_{i}a_{i}^{2})
(∑_{j}b_{j}^{2})
-2(∑_{i}a_{i}b_{i})^{2} $$
The inequality, together with the equality case, follows
immediately, provided that the two sequences are positive,
which we may clearly assume.
Note that the only idea above was to write down the proportionality
of the two sequences in a nice way. The rest of the argument was
an entirely mechanical manipulation. Can we do something similar
for the more abstract, inner-product-space version of the inequality?
For some reason the keyboard I am writing this on refuses to do
vertical bars, so I shall write $[x]$ for the norm of $x$ and $⟨ x,y ⟩$ for
the inner product of $x$ and $y$. Beginning with the real case, we would
like to show that $⟨ x,y ⟩$ is at most $[x][y]$, with equality if and only
if $x$ and $y$ are proportional with a positive constant. Can we express
the proportionality of $x$ and $y$ without using coordinates?
A first attempt is to say that $x$ and $y$ are proportional if
and only if $x/[x]$ and $y/[y]$ are equal. This is
not quite accurate (for example, $y$ might be $-x$), but the inaccuracy
works in our favour as the condition is in fact equivalent to
$x$ and $y$ being proportional with a positive constant. Bearing in
mind that we eventually want a nice expression to deal with,
let us rewrite this equality as $x[y]-y[x]=0$.
We now want some way of distinguishing zero amongst all
vectors in an inner-product space. We need go no further than
the axioms! Indeed, $x[y]-y[x]=0$ if and only if
$$[x[y]-y[x]]^{2}=0$$
I put the square in because one always likes to expand such
an expression in terms of inner products. Indeed, let us do just
that, obtaining that
$$2[x]^{2}[y]^{2}-2[x][y]⟨x,y⟩$$
is greater than or equal to zero, with equality only if
$x[y]-y[x]=0$. If either $[x]$ or $[y]$ is zero then the Cauchy-Schwarz
inequality is trivial. Otherwise, we can divide through by
$2[x][y]$ and obtain the inequality in general, with equality
if and only if $x/[x]$ and $y/[y]$ are equal, that is, if and
only if $x$ and $y$ are proportional with a positive constant of
proportionality.
The complex case is not much harder. This time $[x[y]-y[x]]^{2}$
expands out as
$$2[x]^{2}[y]^{2}-[x][y](⟨x,y⟩ + ⟨y,x⟩)$$
Let $w$ be a complex number of modulus 1 with the property
that $⟨x,wy⟩$ is real and non-negative, and therefore equal to
the modulus of $⟨x,y⟩$. Replacing $y$ with $wy$ we find that
the modulus of $⟨x,y⟩$ is at most $[x][y]$, with equality if and
only if $x[y]-wy[x]=0$. Thus, equality holds for the modulus of the
inner product if and only if $x$ and $y$ are proportional, from
which it is easy to see that it holds for the inner product
itself if and only if the constant of proportionality is real
and positive. (Choosing $w$ above is, admittedly, a trick, but
it is a very standard one.)
The idea of the above arguments is to contrast them with
the usual, slightly less motivated approach of considering the
expression $[x-cy]^{2}$, which is real and non-negative,
and then choosing a clever value of $c$ from which to deduce
the Cauchy-Schwarz inequality. Of course, $c$ can be justified
as the value that minimizes the quadratic expression that results
from expanding $[x-cy]^{2}$, but even so the idea of
writing down $[x-cy]^{2}$ in the first place is not an
obvious one.
Actually (this paragraph was added a day or two later)
it can be justified as follows. Two vectors $x$ and $y$ are
proportional if and only if 0 is a non-trivial linear
combination of the two. Moreover, if neither is zero, then
they are proportional if and only if $x-cy=0$ for some constant
$c$. If this does not happen, then the line of points of the
form $x-cy$ has some positive distance from 0, which we can
calculate by minimizing $[x-cy]$. However, it seems perverse to
bother with this calculation when
we know that if $x-cy$ is ever zero, then $c$ must equal
$[x]/[y]$.
Just for the record, I looked through my bookshelf for all
proofs that I could find of the Cauchy-Schwarz inequality. Only
Apostol (Mathematical Analysis, p.20 exercise 1-15) and Jeffreys
and Jeffreys (Methods of Mathematical Physics, 3rd Ed. p.54) prove
the inequality (for real numbers) this way. The identity that proves
it is known as Lagrange's identity. Even they merely ask you to note
that Lagrange's identity is true and that it implies the Cauchy-Schwarz
inequality. My point above is that the identity is an obvious
thing to write down. |