|
Do Symmetric Problems Have Symmetric Solutions?
Last edited by hbghlyj 2023-8-5 21:37MAA
Waterhouse378-387
Let $f\left(x_{1}, \ldots, x_{n}\right)$ and $g\left(x_{1}, \ldots, x_{n}\right)$ be two symmetric functions. On the set where $g$ stays equal to $g(r, \ldots, r)$, the function $f$ should have a local maximum or minimum at $(r, \ldots, r)$. Of course we do not yet have any idea why, or indeed whether, the principle should be true. All we know (from Bouniakovsky) is that no rudimentary symmetry argument will prove it. But at least we can try testing it by the general method to test for a constrained extremum at a point $P$. The classical way to remember this "Lagrange multiplier" method is to think of curves $c(t)$ that have $c(0)=P$ and lie in the set where $g$ is constant. Differentiating the condition $g(c(t))=$ constant, we find that
$$
0=\sum\left(D_{i} g\right)(P) c_{i}^{\prime}(0)
$$
which says that the curve's tangent vector $c^{\prime}(0)$ is perpendicular to the gradient vector $(\nabla g)(P)$ formed by the $D_{i}(g)=\partial g / \partial x_{i}$. Conversely, so long as $(\nabla g)(P)$ is nontrivial, the implicit function theorem shows that every vector perpendicular to $(\nabla g)(P)$ occurs as the tangent to some such curve $c(t)$. Now if $f$ has a local (constrained) extremum at $P$, each $f(c(t))$ has an extremum at 0 , and its derivative vanishes there. Hence we have
$$
0=\sum\left(D_{i} f\right)(P) c_{i}^{\prime}(0)
$$
This says that $(\nabla f)(P)$ is also perpendicular to all the $c^{\prime}(0)$, which implies that $(\nabla f)(P)$ is a scalar multiple of $(\nabla g)(P)$.
Points where this condition holds (together with points where $\nabla g=0$ ) are the critical points for the constrained extremum problem.
All this is familiar in general. Does something special happen when $f$ and $g$ are symmetric? Let us work out an example, say
$$
\begin{aligned}
&f\left(x_{1}, x_{2}, x_{3}\right)=x_{1} x_{2}+x_{2} x_{3}+x_{3} x_{1} \\
&g\left(x_{1}, x_{2}, x_{3}\right)=\left(x_{1} x_{2} x_{3}\right)^{2}
\end{aligned}
$$
The gradients of these are
$$
\begin{aligned}
&\nabla f=\left(x_{2}+x_{3}, x_{1}+x_{3}, x_{1}+x_{2}\right), \\
&\nabla g=2 x_{1} x_{2} x_{3}\left(x_{2} x_{3}, x_{1} x_{3}, x_{1} x_{2}\right) .
\end{aligned}
$$
We are interested in points $P$ of the form $(r, r, r)$, and there we have
$$
\begin{aligned}
&\nabla f(r, r, r)=(2 r, 2 r, 2 r)=2 r(1,1,1) \\
&\nabla g(r, r, r)=\left(2 r^{5}, 2 r^{5}, 2 r^{5}\right)=2 r^{5}(1,1,1)
\end{aligned}
$$
Thus $\nabla f$ is a multiple of $\nabla g$ simply because in each of them all the entries are equal. Once we see that this is what we want to prove, we have no difficulty proving it.
LEMMA 1. Suppose $f\left(x_{1}, \ldots, x_{n}\right)$ is a symmetric differentiable function. Then at a point $x_{1}=\cdots$ $=x_{n}=r$, all the $D_{i} f$ are equal.
Proof. Let $\pi$ be a permutation of $1,2, \ldots, n$. For any function $h$ we can define
$$
h_{\pi}\left(x_{1}, \ldots, x_{n}\right)=h\left(x_{\pi(1)}, \ldots, x_{\pi(n)}\right),
$$
and trivially then
$$
\left(D_{\pi(i)} h_{\pi}\right)\left(x_{1}, \ldots, x_{n}\right)=\left(D_{i} h\right)\left(x_{\pi(1)}, \ldots, x_{\pi(n)}\right) \text {. }
$$
In the present case all $f_{\pi}$ equal $f$ and all $x_{i}$ are equal.$\blacksquare$
With hindsight (and a little charity) we might say that Purkiss also managed to show that the points $(r, \ldots, r)$ are critical points. But then he undeniably fell into an error. When there is only one free variable, a critical point is "in general" a local maximum or minimum-that is, it will be one or the other except for degenerate cases where the second derivative vanishes. But in several variables this is not true, and nondegenerate critical points can equally well be saddle points.
Purkiss simply ignored this possibility, and thus he left a major gap in his argument. Yet it must be admitted that in our original examples we did not in fact encounter any saddle points. Support for the principle can also be drawn from George Chrystal, who in 1889 included it in Part 2 of his famous textbook on algebra [4, II. 61-63]. Recognizing the inadequacy of Purkiss' proof, Chrystal treated only the case where $f$ and $g$ are symmetric polynomials; these he rewrote in terms of the elementary symmetric polynomials, and after some computation he was able to establish the principle (apart from some degenerate situations). With this encouragement, then, let us take up where Purkiss left off and try to show that for some reason we never get saddle points.
For this we need the second-order terms in the Lagrange multiplier method. These ought to be familiar, but in fact most advanced calculus books seem to skip them. Briefly, then, let us again consider one of our curves $c(t)$ lying in the set where $g=g(P)$. We have
$$
\begin{aligned}
0 &=\left.\frac{d^{2}}{d t^{2}} g(c(t))\right|_{t=0} \\
&=\sum_{i}\left(D_{i} g\right)(P) c_{i}^{\prime \prime}(0)+\sum_{i, j}\left(D_{i} D_{j} g\right)(P) c_{i}^{\prime}(0) c_{j}^{\prime}(0)
\end{aligned}
$$
and$$
\left.\frac{d^{2}}{d t^{2}} f(c(t))\right|_{t=0}=\sum_{i}\left(D_{i} f\right)(P) c_{i}^{\prime \prime}(0)+\sum_{i, j}\left(D_{i} D_{j} f\right)(P) c_{i}^{\prime}(0) c_{j}^{\prime}(0)
$$
We know that at our critical point we have $\nabla f(P)=\lambda(\nabla g)(P)$ for some scalar $\lambda$. Multiplying the first equation above by $\lambda$ and subtracting, we get
$$
\left.\frac{d^{2}}{d t^{2}} f(c(t))\right|_{t=0}=\sum_{i, j}\left[\left(D_{i} D_{j} f\right)(P)-\lambda\left(D_{i} D_{j} g\right)(P)\right] c_{i}^{\prime}(0) c_{j}^{\prime}(0)
$$
For a local maximum or minimum, we want these second derivatives to have the same sign for all $c(t)$. Thus the extra condition we need is that the quadratic form
$$
Q(v)=\sum\left[\left(D_{i} D_{j} f\right)(P)-\lambda\left(D_{i} D_{j} g\right)(P)\right] v_{i} v_{j}
$$
should be positive definite or negative definite on the space of all $v$ perpendicular to $\nabla g(P)$. Using the implicit function theorem, one can show that this condition is indeed sufficient $[5,154]$.
This tells us what we need, but why should it automatically be true for symmetric $f$ and $g$ ? Let us look again at our example,
$$
f=x_{1} x_{2}+x_{2} x_{3}+x_{3} x_{1}, \quad g=\left(x_{1} x_{2} x_{3}\right)^{2} .
$$
At a point $(r, r, r)$ we find that the matrices of second partials are
$$
\begin{aligned}
& \left(D_i D_j f(P)\right)=\left(\begin{array}{lll}
0 & 1 & 1 \\
1 & 0 & 1 \\
1 & 1 & 0
\end{array}\right) \\
& \left(D_i D_j g(P)\right)=\left(\begin{array}{lll}
2 r^4 & 4 r^4 & 4 r^4 \\
4 r^4 & 2 r^4 & 4 r^4 \\
4 r^4 & 4 r^4 & 2 r^4
\end{array}\right) .
\end{aligned}
$$
What strikes the eye here is that all the diagonal entries are equal and all the off-diagonal entries are equal. Again, once we notice this we can easily check it in general.
Lemma 2. Suppose $f\left(x_1, \ldots, x_n\right)$ is a symmetric twice-differentiable function. Then at a point $(r, \ldots, r)$ all $D_i D_i f$ are equal and all $D_i D_j f$ for $i \neq j$ are equal.
Proof. In the proof of Lemma 1 we saw that $D_{\pi(i)} h=\left(D_i h\right)_\pi$ for any $h$. Applying this rule twice, we get
$$
D_{\pi(i)} D_{\pi(j)} f=D_{\pi(i)} D_{\pi(j)} f_\pi=D_{\pi(i)}\left[\left(D_j f\right)_\pi\right]=\left[D_i D_j f\right]_\pi
$$
Take first $i=j$, and choose $\pi$ so $\pi(i)=1$; this gives us $\left(D_1 D_1 f\right)(r, \ldots, r)=\left(D_i D_i f\right)(r, \ldots, r)$. Take next any $i \neq j$, and choose $\pi$ so $\pi(i)=1$ and $\pi(j)=2$; then we get $\left(D_1 D_2 f\right)(r, \ldots, r)=$ $\left(D_{\imath} D_j f\right)(r, \ldots, r)$
A straightforward computation now gives us the final lemma we need:
LEMMA 3. Suppose a quadratic form $Q$ is given by
$$
Q\left(v_1, \ldots, v_n\right)=\Sigma_i b v_i v_i+\Sigma_{i \neq j} c v_i v_j \text {. }
$$
Then for all $v$ satisfying $\Sigma v_i=0$, we have $Q(v)=(c-b) \Sigma v_i^2$
THEOREM (The Purkiss PRINCIPLE). Let $f$ and $g$ be symmetric functions with continuous second derivatives in the neighborhood of a point $P=(r, \ldots, r)$. On the set where $g$ equals $g(P)$, the function $f$ will have a local maximum or minimum at $P$ except in degenerate cases.
Proof. Assuming $\nabla g(P) \neq 0$, we know from Lemma 1 that $\nabla f(P)$ has the form $\lambda \nabla g(P)$. The second partial derivatives of $f$ and $g$ satisfy the equalities in Lemma 2, and hence the terms $\left(D_i D_j f\right)(P)-\lambda\left(D_i D_j g\right)(P)$ also satisfy those equalities. Lemma 1 shows that the vectors $v$ perpendicular to $\nabla g(P)$ are those with $\Sigma v_i=0$. Lemma 3 then shows that on those $v$ our quadratic form (if not identically zero) is positive or negative definite. The result then follows from the Lagrange multiplier criterion.
Two types of degeneracy were excluded from the proof: we assumed that $\nabla g(P)$ was nonzero, and then we assumed that the second-order terms in $f-\lambda g$ did not all vanish in the directions perpendicular to $\nabla g(P)$. Such exclusions are indeed necessary, and the Purkiss principle is not universally true. To see why a condition on $\nabla g$ is needed, consider the extreme case where $g$ is constant everywhere; there is then no constraint, and hardly any symmetric $f$ will satisfy the condition. It is a little harder to find an example that fails with the second-order degeneracy, but here is one such case. Let the functions be
$$
\begin{aligned}
& f=x_1{ }^4 x_2 x_3+x_2{ }^4 x_3 x_1+x_3{ }^4 x_1 x_2, \\
& g=x_1{ }^3 x_2{ }^3+x_2{ }^3 x_3{ }^3+x_3{ }^3 x_1{ }^3
\end{aligned}
$$
around the point $P=(1,1,1)$. Consider the curve consisting of points $(s t, s, s)$, where $t$ is close to 1 and $s$ is chosen to keep $g=3$ (that is, $s^6\left(2 t^3+1\right)=3$ ). For such points
$$
f(s t, s, s)=s^6\left(t^4+2 t\right)=3\left(t^4+2 t\right) /\left(2 t^3+1\right) .
$$
The derivative of this comes out to be $6\left(t^3-1\right)^2 /\left(2 t^3+1\right)^2$, which is zero at $t=1$ but positive on both sides nearby. Thus $f$ has values less than 3 for $t=1-\varepsilon$ and bigger than 3 for $t=1+\varepsilon$, and the Purkiss principle fails. It is not at all surprising that such exceptions should exist. What is surprising is how widely the principle turns out to be correct. |
|