找回密码
 快速注册
搜索
查看: 50|回复: 0

矩阵求导

[复制链接]

3147

主题

8381

回帖

6万

积分

$\style{scale:11;fill:#eff}꩜$

积分
65357
QQ

显示全部楼层

hbghlyj 发表于 2022-6-27 22:38 |阅读模式
https://atmos.washington.edu/~dennis/MatrixCalculus.pdf#page=4

In the following discussion I will differentiate matrix quantities with respect to the elements of the referenced matrices. Although no new concept is required to carry out such operations, the element-by-element calculations involve cumbersome manipulations and, thus, it is useful to derive the necessary results and have them readily available

Convention 3

Let $$\mathbf y = ψ(\mathbf x), \tag{23}$$ where $\bf y$ is an $m$-element vector, and $\bf x$ is an $n$-element vector. The symbol $$\frac{\partial \mathbf{y}}{\partial \mathbf{x}}=\left[\begin{array}{cccc}\frac{\partial y_{1}}{\partial x_{1}} & \frac{\partial y_{1}}{\partial x_{2}} & \cdots & \frac{\partial y_{1}}{\partial x_{n}} \\ \frac{\partial y_{2}}{\partial x_{1}} & \frac{\partial y_{2}}{\partial x_{2}} & \cdots & \frac{\partial y_{2}}{\partial x_{n}} \\ \vdots & \vdots & & \vdots \\ \frac{\partial y_{m}}{\partial x_{1}} & \frac{\partial y_{m}}{\partial x_{2}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}\end{array}\right]\tag{24}$$will denote the $m × n$ matrix of first-order partial derivatives of the transformation from $\bf x$ to $\bf y$. Such a matrix is called the Jacobian matrix of the transformation $ψ()$.

Notice that if $\bf x$ is actually a scalar in Convention 3 then the resulting Jacobian matrix is a $m × 1$ matrix; that is, a single column (a vector). On the other hand, if $\bf y$ is actually a scalar in Convention 3 then the resulting Jacobian matrix is a $1 × n$ matrix; that is, a single row (the transpose of a vector).

Proposition 5 Let$${\bf y} = {\bf Ax}\tag{25} $$where $\mathbf{y}$ is $m \times 1, \mathbf{x}$ is $n \times 1, \mathbf{A}$ is $m \times n$, and $\mathbf{A}$ does not depend on $\mathbf{x}$, then $$ \frac{\partial \mathbf{y}}{\partial \mathbf{x}}=\mathbf{A} $$ Proof: Since the $i$th element of $\mathbf{y}$ is given by $$ y_{i}=\sum_{k=1}^{n} a_{i k} x_{k} $$ it follows that $$ \frac{\partial y_{i}}{\partial x_{j}}=a_{i j} $$ for all $i=1,2, \ldots, m, \quad j=1,2, \ldots, n$. Hence $$ \frac{\partial \mathbf{y}}{\partial \mathbf{x}}=\mathbf{A} $$ q.e.d.
Proposition 6 Let $$ \mathbf{y}=\mathbf{A x} $$ where $\mathbf{y}$ is $m \times 1, \mathbf{x}$ is $n \times 1, \mathbf{A}$ is $m \times n$, and $\mathbf{A}$ does not depend on $\mathbf{x}$, as in Proposition 5 . Suppose that $\mathbf{x}$ is a function of the vector $\mathbf{z}$, while $\mathbf{A}$ is independent of $\mathbf{z}$. Then $$ \frac{\partial \mathbf{y}}{\partial\bf z}=\mathbf{A} \frac{\partial \mathbf{x}}{\partial \mathbf{z}} $$ Proof: Since the $i$th element of $\mathbf{y}$ is given by $$ y_{i}=\sum_{k=1}^{n} a_{i k} x_{k} $$ for all $i=1,2, \ldots, m$, it follows that $$ \frac{\partial y_{i}}{\partial z_{j}}=\sum_{k=1}^{n} a_{i k} \frac{\partial x_{k}}{\partial z_{j}} $$ but the right hand side of the above is simply element $(i, j)$ of $\mathbf{A} \frac{\partial \mathbf{x}}{\partial \mathbf{z}}$. Hence $$ \frac{\partial \mathbf{y}}{\partial \mathbf{z}}=\mathbf{A} \frac{\partial \mathbf{x}}{\partial \mathbf{z}} $$ q.e.d.
Proposition 7 Let the scalar $\alpha$ be defined by $$ \alpha=\mathbf{y}^{\top} \mathbf{A} \mathbf{x} $$ where $\mathbf{y}$ is $m\times 1, \mathbf{x}$ is $n \times 1, \mathbf{A}$ is $m\times n$, and $\mathbf{A}$ is independent of $\mathbf{x}$ and $\mathbf{y}$, then$$\frac{\partial \alpha}{\partial \mathbf{x}}=\mathbf{y}^{\top} \mathbf{A}$$ and $$ \frac{\partial \alpha}{\partial \mathbf{y}}=\mathbf{x}^{\top} \mathbf{A}^{\top} $$ Proof: Define $$ \mathbf{w}^{\top}=\mathbf{y}^{\top} \mathbf{A} $$ and note that $$ \alpha=\mathbf{w}^{\top} \mathbf{x} $$ Hence, by Proposition 5 we have that $$ \frac{\partial \alpha}{\partial \mathbf{x}}=\mathbf{w}^{\top}=\mathbf{y}^{\top} \mathbf{A} $$ which is the first result. Since $\alpha$ is a scalar, we can write $$ \alpha=\alpha^{\top}=\mathbf{x}^{\top} \mathbf{A}^{\top} \mathbf{y} $$ and applying Proposition 5 as before we obtain $$ \frac{\partial \alpha}{\partial \mathbf{y}}=\mathbf{x}^{\top} \mathbf{A}^{\top} $$ q.e.d.
Proposition 8 For the special case in which the scalar $\alpha$ is given by the quadratic form $$ \alpha=\mathbf{x}^{\top} \mathbf{A} \mathbf{x} $$ where $\mathbf{x}$ is $n \times 1$, $\mathbf{A}$ is $n \times n$, and $\mathbf{A}$ does not depend on $\mathbf{x}$, then $$ \frac{\partial \alpha}{\partial \mathbf{x}}=\mathbf{x}^{\top}\left(\mathbf{A}+\mathbf{A}^{\top}\right) $$ Proof: By definition $$ \alpha=\sum_{j=1}^{n} \sum_{i=1}^{n} a_{i j} x_{i} x_{j} $$ Differentiating with respect to the $k$th element of $\mathbf{x}$ we have $$ \frac{\partial \alpha}{\partial x_{k}}=\sum_{j=1}^{n} a_{k j} x_{j}+\sum_{i=1}^{n} a_{i k} x_{i} $$ for all $k=1,2, \ldots, n$, and consequently, $$ \frac{\partial \alpha}{\partial \mathbf{x}}=\mathbf{x}^{\top} \mathbf{A}^{\top}+\mathbf{x}^{\top} \mathbf{A}=\mathbf{x}^{\top}\left(\mathbf{A}^{\top}+\mathbf{A}\right) $$ q.e.d.
Proposition 9 For the special case where $\mathbf{A}$ is a symmetric matrix and $$ \alpha=\mathbf{x}^{\top} \mathbf{A} \mathbf{x} $$ where $\mathbf{x}$ is $n \times 1$, $\mathbf{A}$ is $n \times n$, and $\mathbf{A}$ does not depend on $\mathbf{x}$, then $$ \frac{\partial \alpha}{\partial \mathbf{x}}=2 \mathbf{x}^{\top} \mathbf{A} $$ Proof: This is an obvious application of Proposition 8. q.e.d.
Proposition 10 Let the scalar $\alpha$ be defined by $$ \alpha=\mathbf{y}^{\top} \mathbf{x} $$ where $\mathbf{y}$ is $n \times 1, \mathbf{x}$ is $n \times 1$, and both $\mathbf{y}$ and $\mathbf{x}$ are functions of the vector $\mathbf{z}$. Then $$ \frac{\partial \alpha}{\partial \mathbf{z}}=\mathbf{x}^{\top} \frac{\partial \mathbf{y}}{\partial \mathbf{z}}+\mathbf{y}^{\top} \frac{\partial \mathbf{x}}{\partial \mathbf{z}} $$ Proof: We have $$ \alpha=\sum_{j=1}^{n} x_{j} y_{j} $$ Differentiating with respect to the $k$ th element of $\mathbf{z}$ we have $$ \frac{\partial \alpha}{\partial z_{k}}=\sum_{j=1}^{n}\left(x_{j} \frac{\partial y_{j}}{\partial z_{k}}+y_{j} \frac{\partial x_{j}}{\partial z_{k}}\right) $$ for all $k=1,2, \ldots,n$, and consequently, $$ \frac{\partial \alpha}{\partial \mathbf{z}}=\frac{\partial \alpha}{\partial \mathbf{y}} \frac{\partial \mathbf{y}}{\partial \mathbf{z}}+\frac{\partial \alpha}{\partial \mathbf{x}} \frac{\partial \mathbf{x}}{\partial \mathbf{z}}=\mathbf{x}^{\top} \frac{\partial \mathbf{y}}{\partial \mathbf{z}}+\mathbf{y}^{\top} \frac{\partial \mathbf{x}}{\partial \mathbf{z}} $$ q.e.d.
Proposition 11 Let the scalar $\alpha$ be defined by $$ \alpha=\mathbf{x}^{\top} \mathbf{x} $$ where $\mathbf{x}$ is $n \times 1$, and $\mathbf{x}$ is a function of the vector $\mathbf{z}$. Then $$ \frac{\partial \alpha}{\partial \mathbf{z}}=2 \mathbf{x}^{\top} \frac{\partial \mathbf{x}}{\partial \mathbf{z}} $$ Proof: This is an obvious application of Proposition 10. q.e.d.
Proposition 12 Let the scalar $\alpha$ be defined by $$ \alpha=\mathbf{y}^{\top} \mathbf{A} \mathbf{x} $$ where $\mathbf{y}$ is $m \times 1, \mathbf{x}$ is $n \times 1, \mathbf{A}$ is $m \times n$, and both $\mathbf{y}$ and $\mathbf{x}$ are functions of the vector $\mathbf{z}$, while $\mathbf{A}$ does not depend on $\mathbf{z}$. Then $$ \frac{\partial \alpha}{\partial \mathbf{z}}=\mathbf{x}^{\top} \mathbf{A}^{\top} \frac{\partial \mathbf{y}}{\partial \mathbf{z}}+\mathbf{y}^{\top} \mathbf{A} \frac{\partial \mathbf{x}}{\partial \mathbf{z}} $$ Proof: Define $$ \mathbf{w}^{\top}=\mathbf{y}^{\top} \mathbf{A} $$ and note that $$ \alpha=\mathbf{w}^{\top} \mathbf{x} $$ Applying Proposition 10 we have $$ \frac{\partial \alpha}{\partial \mathbf{z}}=\mathbf{x}^{\top} \frac{\partial \mathbf{w}}{\partial \mathbf{z}}+\mathbf{w}^{\top} \frac{\partial \mathbf{x}}{\partial \mathbf{z}} $$ Substituting back in for $\mathbf{w}$ we arrive at $$ \frac{\partial \alpha}{\partial \mathbf{z}}=\frac{\partial \alpha}{\partial \mathbf{y}} \frac{\partial \mathbf{y}}{\partial \mathbf{z}}+\frac{\partial \alpha}{\partial \mathbf{x}} \frac{\partial \mathbf{x}}{\partial \mathbf{z}}=\mathbf{x}^{\top} \mathbf{A}^{\top} \frac{\partial \mathbf{y}}{\partial \mathbf{z}}+\mathbf{y}^{\top} \mathbf{A} \frac{\partial \mathbf{x}}{\partial \mathbf{z}} $$ q.e.d.
Proposition 13 Let the scalar $\alpha$ be defined by the quadratic form $$ \alpha=\mathbf{x}^{\top} \mathbf{A} \mathbf{x} $$ where $\mathbf{x}$ is $n \times 1$, $\bf A$ is $n \times n$, and $\mathbf{x}$ is a function of the vector $\mathbf{z}$, while $\mathbf{A}$ does not depend on $z$. Then $$ \frac{\partial \alpha}{\partial \mathbf{z}}=\mathbf{x}^{\top}\left(\mathbf{A}+\mathbf{A}^{\top}\right) \frac{\partial \mathbf{x}}{\partial \mathbf{z}} $$ Proof: This is an obvious application of Proposition 12. q.e.d.
Proposition 14 For the special case where $\mathbf{A}$ is a symmetric matrix and $$ \alpha=\mathbf{x}^{\top} \mathbf{A} \mathbf{x} $$ where $\mathbf{x}$ is $n\times 1, \mathbf{A}$ is $n\times n$, and $\mathbf{x}$ is a function of the vector $\mathbf{z}$, while $\mathbf{A}$ does not depend on $z$. Then $$ \frac{\partial \alpha}{\partial \mathbf{z}}=2 \mathbf{x}^{\top} \mathbf{A} \frac{\partial \mathbf{x}}{\partial \mathbf{z}} $$ Proof: This is an obvious application of Proposition 13. q.e.d.
Definition 5 Let $\mathbf{A}$ be a $m \times n$ matrix whose elements are functions of the scalar parameter $\alpha$. Then the derivative of the matrix $\mathbf{A}$ with respect to the scalar parameter $\alpha$ is the $m \times n$ matrix of element-by-element derivatives: $$ \frac{\partial \mathbf{A}}{\partial \alpha}=\left[\begin{array}{cccc} \frac{\partial a_{11}}{\partial \alpha} & \frac{\partial a_{12}}{\partial \alpha} & \ldots & \frac{\partial a_{1 n}}{\partial \alpha} \\ \frac{\partial a_{21}}{\partial \alpha} & \frac{\partial a_{22}}{\partial \alpha} & \ldots & \frac{\partial a_{2 n}}{\partial \alpha} \\ \vdots & \vdots & & \vdots \\ \frac{\partial a_{m 1}}{\partial \alpha} & \frac{\partial a_{m 2}}{\partial \alpha} & \ldots & \frac{\partial a_{m n}}{\partial \alpha} \end{array}\right] $$ Proposition 15 Let $\mathbf{A}$ be a nonsingular, $m\times m$ matrix whose elements are functions of the scalar parameter $\alpha$. Then $$ \frac{\partial \mathbf{A}^{-1}}{\partial \alpha}=-\mathbf{A}^{-1} \frac{\partial \mathbf{A}}{\partial \alpha} \mathbf{A}^{-1} $$ Proof: Start with the definition of the inverse $$ \mathbf{A}^{-1} \mathbf{A}=\mathbf{I} $$ and differentiate, yielding $$ \mathbf{A}^{-1} \frac{\partial \mathbf{A}}{\partial \alpha}+\frac{\partial \mathbf{A}^{-1}}{\partial \alpha} \mathbf{A}=\mathbf{0} $$ rearranging the terms yields $$ \frac{\partial \mathbf{A}^{-1}}{\partial \alpha}=-\mathbf{A}^{-1} \frac{\partial \mathbf{A}}{\partial \alpha} \mathbf{A}^{-1} $$ q.e.d.
本帖最后由 hbghlyj 于 2022-6-28 09:07 编辑

手机版|悠闲数学娱乐论坛(第3版)

GMT+8, 2025-3-4 20:58

Powered by Discuz!

× 快速回复 返回顶部 返回列表