|
本帖最后由 hbghlyj 于 2022-6-28 09:07 编辑
| https://atmos.washington.edu/~dennis/MatrixCalculus.pdf#page=4
In the following discussion I will differentiate matrix quantities with respect to the elements
of the referenced matrices. Although no new concept is required to carry out such operations,
the element-by-element calculations involve cumbersome manipulations and, thus, it is useful
to derive the necessary results and have them readily available
Convention 3
Let
$$\mathbf y = ψ(\mathbf x), \tag{23}$$
where $\bf y$ is an $m$-element vector, and $\bf x$ is an $n$-element vector. The symbol
$$\frac{\partial \mathbf{y}}{\partial \mathbf{x}}=\left[\begin{array}{cccc}\frac{\partial y_{1}}{\partial x_{1}} & \frac{\partial y_{1}}{\partial x_{2}} & \cdots & \frac{\partial y_{1}}{\partial x_{n}} \\ \frac{\partial y_{2}}{\partial x_{1}} & \frac{\partial y_{2}}{\partial x_{2}} & \cdots & \frac{\partial y_{2}}{\partial x_{n}} \\ \vdots & \vdots & & \vdots \\ \frac{\partial y_{m}}{\partial x_{1}} & \frac{\partial y_{m}}{\partial x_{2}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}\end{array}\right]\tag{24}$$will denote the $m × n$ matrix of first-order partial derivatives of the transformation from $\bf x$ to $\bf y$. Such a matrix is called the Jacobian matrix of the transformation $ψ()$.
Notice that if $\bf x$ is actually a scalar in Convention 3 then the resulting Jacobian matrix
is a $m × 1$ matrix; that is, a single column (a vector). On the other hand, if $\bf y$ is actually a
scalar in Convention 3 then the resulting Jacobian matrix is a $1 × n$ matrix; that is, a single
row (the transpose of a vector).
Proposition 5 Let$${\bf y} = {\bf Ax}\tag{25} $$where $\mathbf{y}$ is $m \times 1, \mathbf{x}$ is $n \times 1, \mathbf{A}$ is $m \times n$, and $\mathbf{A}$ does not depend on $\mathbf{x}$, then
$$
\frac{\partial \mathbf{y}}{\partial \mathbf{x}}=\mathbf{A}
$$
Proof: Since the $i$th element of $\mathbf{y}$ is given by
$$
y_{i}=\sum_{k=1}^{n} a_{i k} x_{k}
$$
it follows that
$$
\frac{\partial y_{i}}{\partial x_{j}}=a_{i j}
$$
for all $i=1,2, \ldots, m, \quad j=1,2, \ldots, n$. Hence
$$
\frac{\partial \mathbf{y}}{\partial \mathbf{x}}=\mathbf{A}
$$
q.e.d.
Proposition 6 Let
$$
\mathbf{y}=\mathbf{A x}
$$
where $\mathbf{y}$ is $m \times 1, \mathbf{x}$ is $n \times 1, \mathbf{A}$ is $m \times n$, and $\mathbf{A}$ does not depend on $\mathbf{x}$, as in Proposition 5 . Suppose that $\mathbf{x}$ is a function of the vector $\mathbf{z}$, while $\mathbf{A}$ is independent of $\mathbf{z}$. Then
$$
\frac{\partial \mathbf{y}}{\partial\bf z}=\mathbf{A} \frac{\partial \mathbf{x}}{\partial \mathbf{z}}
$$
Proof: Since the $i$th element of $\mathbf{y}$ is given by
$$
y_{i}=\sum_{k=1}^{n} a_{i k} x_{k}
$$
for all $i=1,2, \ldots, m$, it follows that
$$
\frac{\partial y_{i}}{\partial z_{j}}=\sum_{k=1}^{n} a_{i k} \frac{\partial x_{k}}{\partial z_{j}}
$$
but the right hand side of the above is simply element $(i, j)$ of $\mathbf{A} \frac{\partial \mathbf{x}}{\partial \mathbf{z}}$. Hence
$$
\frac{\partial \mathbf{y}}{\partial \mathbf{z}}=\mathbf{A} \frac{\partial \mathbf{x}}{\partial \mathbf{z}}
$$
q.e.d.
Proposition 7 Let the scalar $\alpha$ be defined by
$$
\alpha=\mathbf{y}^{\top} \mathbf{A} \mathbf{x}
$$
where $\mathbf{y}$ is $m\times 1, \mathbf{x}$ is $n \times 1, \mathbf{A}$ is $m\times n$, and $\mathbf{A}$ is independent of $\mathbf{x}$ and $\mathbf{y}$, then$$\frac{\partial \alpha}{\partial \mathbf{x}}=\mathbf{y}^{\top} \mathbf{A}$$
and
$$
\frac{\partial \alpha}{\partial \mathbf{y}}=\mathbf{x}^{\top} \mathbf{A}^{\top}
$$
Proof: Define
$$
\mathbf{w}^{\top}=\mathbf{y}^{\top} \mathbf{A}
$$
and note that
$$
\alpha=\mathbf{w}^{\top} \mathbf{x}
$$
Hence, by Proposition 5 we have that
$$
\frac{\partial \alpha}{\partial \mathbf{x}}=\mathbf{w}^{\top}=\mathbf{y}^{\top} \mathbf{A}
$$
which is the first result. Since $\alpha$ is a scalar, we can write
$$
\alpha=\alpha^{\top}=\mathbf{x}^{\top} \mathbf{A}^{\top} \mathbf{y}
$$
and applying Proposition 5 as before we obtain
$$
\frac{\partial \alpha}{\partial \mathbf{y}}=\mathbf{x}^{\top} \mathbf{A}^{\top}
$$
q.e.d.
Proposition 8 For the special case in which the scalar $\alpha$ is given by the quadratic form
$$
\alpha=\mathbf{x}^{\top} \mathbf{A} \mathbf{x}
$$
where $\mathbf{x}$ is $n \times 1$, $\mathbf{A}$ is $n \times n$, and $\mathbf{A}$ does not depend on $\mathbf{x}$, then
$$
\frac{\partial \alpha}{\partial \mathbf{x}}=\mathbf{x}^{\top}\left(\mathbf{A}+\mathbf{A}^{\top}\right)
$$
Proof: By definition
$$
\alpha=\sum_{j=1}^{n} \sum_{i=1}^{n} a_{i j} x_{i} x_{j}
$$
Differentiating with respect to the $k$th element of $\mathbf{x}$ we have
$$
\frac{\partial \alpha}{\partial x_{k}}=\sum_{j=1}^{n} a_{k j} x_{j}+\sum_{i=1}^{n} a_{i k} x_{i}
$$
for all $k=1,2, \ldots, n$, and consequently,
$$
\frac{\partial \alpha}{\partial \mathbf{x}}=\mathbf{x}^{\top} \mathbf{A}^{\top}+\mathbf{x}^{\top} \mathbf{A}=\mathbf{x}^{\top}\left(\mathbf{A}^{\top}+\mathbf{A}\right)
$$
q.e.d.
Proposition 9 For the special case where $\mathbf{A}$ is a symmetric matrix and
$$
\alpha=\mathbf{x}^{\top} \mathbf{A} \mathbf{x}
$$
where $\mathbf{x}$ is $n \times 1$, $\mathbf{A}$ is $n \times n$, and $\mathbf{A}$ does not depend on $\mathbf{x}$, then
$$
\frac{\partial \alpha}{\partial \mathbf{x}}=2 \mathbf{x}^{\top} \mathbf{A}
$$
Proof: This is an obvious application of Proposition 8. q.e.d.
Proposition 10 Let the scalar $\alpha$ be defined by
$$
\alpha=\mathbf{y}^{\top} \mathbf{x}
$$
where $\mathbf{y}$ is $n \times 1, \mathbf{x}$ is $n \times 1$, and both $\mathbf{y}$ and $\mathbf{x}$ are functions of the vector $\mathbf{z}$. Then
$$
\frac{\partial \alpha}{\partial \mathbf{z}}=\mathbf{x}^{\top} \frac{\partial \mathbf{y}}{\partial \mathbf{z}}+\mathbf{y}^{\top} \frac{\partial \mathbf{x}}{\partial \mathbf{z}}
$$
Proof: We have
$$
\alpha=\sum_{j=1}^{n} x_{j} y_{j}
$$
Differentiating with respect to the $k$ th element of $\mathbf{z}$ we have
$$
\frac{\partial \alpha}{\partial z_{k}}=\sum_{j=1}^{n}\left(x_{j} \frac{\partial y_{j}}{\partial z_{k}}+y_{j} \frac{\partial x_{j}}{\partial z_{k}}\right)
$$
for all $k=1,2, \ldots,n$, and consequently,
$$
\frac{\partial \alpha}{\partial \mathbf{z}}=\frac{\partial \alpha}{\partial \mathbf{y}} \frac{\partial \mathbf{y}}{\partial \mathbf{z}}+\frac{\partial \alpha}{\partial \mathbf{x}} \frac{\partial \mathbf{x}}{\partial \mathbf{z}}=\mathbf{x}^{\top} \frac{\partial \mathbf{y}}{\partial \mathbf{z}}+\mathbf{y}^{\top} \frac{\partial \mathbf{x}}{\partial \mathbf{z}}
$$
q.e.d.
Proposition 11 Let the scalar $\alpha$ be defined by
$$
\alpha=\mathbf{x}^{\top} \mathbf{x}
$$
where $\mathbf{x}$ is $n \times 1$, and $\mathbf{x}$ is a function of the vector $\mathbf{z}$. Then
$$
\frac{\partial \alpha}{\partial \mathbf{z}}=2 \mathbf{x}^{\top} \frac{\partial \mathbf{x}}{\partial \mathbf{z}}
$$
Proof: This is an obvious application of Proposition 10. q.e.d.
Proposition 12 Let the scalar $\alpha$ be defined by
$$
\alpha=\mathbf{y}^{\top} \mathbf{A} \mathbf{x}
$$
where $\mathbf{y}$ is $m \times 1, \mathbf{x}$ is $n \times 1, \mathbf{A}$ is $m \times n$, and both $\mathbf{y}$ and $\mathbf{x}$ are functions of the vector $\mathbf{z}$, while $\mathbf{A}$ does not depend on $\mathbf{z}$. Then
$$
\frac{\partial \alpha}{\partial \mathbf{z}}=\mathbf{x}^{\top} \mathbf{A}^{\top} \frac{\partial \mathbf{y}}{\partial \mathbf{z}}+\mathbf{y}^{\top} \mathbf{A} \frac{\partial \mathbf{x}}{\partial \mathbf{z}}
$$
Proof: Define
$$
\mathbf{w}^{\top}=\mathbf{y}^{\top} \mathbf{A}
$$
and note that
$$
\alpha=\mathbf{w}^{\top} \mathbf{x}
$$
Applying Proposition 10 we have
$$
\frac{\partial \alpha}{\partial \mathbf{z}}=\mathbf{x}^{\top} \frac{\partial \mathbf{w}}{\partial \mathbf{z}}+\mathbf{w}^{\top} \frac{\partial \mathbf{x}}{\partial \mathbf{z}}
$$
Substituting back in for $\mathbf{w}$ we arrive at
$$
\frac{\partial \alpha}{\partial \mathbf{z}}=\frac{\partial \alpha}{\partial \mathbf{y}} \frac{\partial \mathbf{y}}{\partial \mathbf{z}}+\frac{\partial \alpha}{\partial \mathbf{x}} \frac{\partial \mathbf{x}}{\partial \mathbf{z}}=\mathbf{x}^{\top} \mathbf{A}^{\top} \frac{\partial \mathbf{y}}{\partial \mathbf{z}}+\mathbf{y}^{\top} \mathbf{A} \frac{\partial \mathbf{x}}{\partial \mathbf{z}}
$$
q.e.d.
Proposition 13 Let the scalar $\alpha$ be defined by the quadratic form
$$
\alpha=\mathbf{x}^{\top} \mathbf{A} \mathbf{x}
$$
where $\mathbf{x}$ is $n \times 1$, $\bf A$ is $n \times n$, and $\mathbf{x}$ is a function of the vector $\mathbf{z}$, while $\mathbf{A}$ does not depend on $z$. Then
$$
\frac{\partial \alpha}{\partial \mathbf{z}}=\mathbf{x}^{\top}\left(\mathbf{A}+\mathbf{A}^{\top}\right) \frac{\partial \mathbf{x}}{\partial \mathbf{z}}
$$
Proof: This is an obvious application of Proposition 12. q.e.d.
Proposition 14 For the special case where $\mathbf{A}$ is a symmetric matrix and
$$
\alpha=\mathbf{x}^{\top} \mathbf{A} \mathbf{x}
$$
where $\mathbf{x}$ is $n\times 1, \mathbf{A}$ is $n\times n$, and $\mathbf{x}$ is a function of the vector $\mathbf{z}$, while $\mathbf{A}$ does not depend on $z$. Then
$$
\frac{\partial \alpha}{\partial \mathbf{z}}=2 \mathbf{x}^{\top} \mathbf{A} \frac{\partial \mathbf{x}}{\partial \mathbf{z}}
$$
Proof: This is an obvious application of Proposition 13. q.e.d.
Definition 5 Let $\mathbf{A}$ be a $m \times n$ matrix whose elements are functions of the scalar parameter $\alpha$. Then the derivative of the matrix $\mathbf{A}$ with respect to the scalar parameter $\alpha$ is the $m \times n$ matrix of element-by-element derivatives:
$$
\frac{\partial \mathbf{A}}{\partial \alpha}=\left[\begin{array}{cccc}
\frac{\partial a_{11}}{\partial \alpha} & \frac{\partial a_{12}}{\partial \alpha} & \ldots & \frac{\partial a_{1 n}}{\partial \alpha} \\
\frac{\partial a_{21}}{\partial \alpha} & \frac{\partial a_{22}}{\partial \alpha} & \ldots & \frac{\partial a_{2 n}}{\partial \alpha} \\
\vdots & \vdots & & \vdots \\
\frac{\partial a_{m 1}}{\partial \alpha} & \frac{\partial a_{m 2}}{\partial \alpha} & \ldots & \frac{\partial a_{m n}}{\partial \alpha}
\end{array}\right]
$$
Proposition 15 Let $\mathbf{A}$ be a nonsingular, $m\times m$ matrix whose elements are functions of the scalar parameter $\alpha$. Then
$$
\frac{\partial \mathbf{A}^{-1}}{\partial \alpha}=-\mathbf{A}^{-1} \frac{\partial \mathbf{A}}{\partial \alpha} \mathbf{A}^{-1}
$$
Proof: Start with the definition of the inverse
$$
\mathbf{A}^{-1} \mathbf{A}=\mathbf{I}
$$
and differentiate, yielding
$$
\mathbf{A}^{-1} \frac{\partial \mathbf{A}}{\partial \alpha}+\frac{\partial \mathbf{A}^{-1}}{\partial \alpha} \mathbf{A}=\mathbf{0}
$$
rearranging the terms yields
$$
\frac{\partial \mathbf{A}^{-1}}{\partial \alpha}=-\mathbf{A}^{-1} \frac{\partial \mathbf{A}}{\partial \alpha} \mathbf{A}^{-1}
$$
q.e.d.
|