教材牛顿法求方程近似解时为什么精度那样计算

hjfmhh · 2021-12-4 19:34

Last edited by hbghlyj 2025-4-1 06:48教材第一章导数及其应用 p21
继续这个过程，就可以推导出如下求方程根的牛顿法公式：
如果 $f'\left(x_{n-1}\right) \neq 0$，那么
\[
x_n=x_{n-1}-\frac{f\left(x_{n-1}\right)}{f'\left(x_{n-1}\right)}
\]
请同学们自己推导。
对于一个给定的精确度，我们可以根据上述公式，求出方程 $x^2+2 x^2+10 x-20=0$ 的近似解.
下面，我们给出牛顿法的算法框图．同学们可以根据它编一个程库，让计算机帮你完成计算任务．

1．不同的初始值对求方程的近似解有影响吗？如果有，影响在什么地方？

2．你还知道其他求方程近似解的方法吗？你认为牛顿法的优点和缺点是什么？

hbghlyj · 2022-4-1 10:27

Last edited by hbghlyj 2025-3-22 22:59Root-finding Algorithms (Day 3) How Mathematica Does It: Newton's Method

In our last two classes, we've constructed algorithms that can find all of the roots of a given functions. From a pure, non-applied, setting, this is pretty satisfying: we had a question (how do we find roots) and we created a foolproof algorithm (Sturm's theorem) that answers this completely.

However, in practice, this is only the beginning of the problem. Being

to find the roots, while important, is of relatively little value if our algorithm doesn't run

to actually be of use! In our last lecture, we're going to look at Newton's method, a root-finding technique that is how programs like Mathematica actually calculate roots.

A New Challenger Appears: Newton's Method

The motivations for Newton's method are twofold:

In our two earlier root-finding algorithms, we needed to know the value of our function at
points, and furthermore know that the signs of our function differed at those two points. This is often not information we will have: i.e. for a function like $x^4 - 2x^3 - 3x^2 + 4x + 4$, there are roots (if you factor, it's $(x+1)^2(x-2)^2$) but in fact no places where the function is actually
! We want to get around this issue.
On the other hand: we saw that with the false-position method, using the data from multiple points on our curve can help us find an approximation more quickly. Effectively, we used the data at two points to create a
to our function, and used that to help us come up with better guesses for our roots. We want to use this observation in later algorithms!

So: let's suppose we're not in a situation where we know what's happening at two points: i.e. we have some function $f(x)$, and we're trying to find a root of $f(x)$ starting from just some initial guess $a$. How can we do this?

Well: if we use our second observation, a natural thing to try is

near this point $a$.  We can do this with the derivative, by finding a line with slope $f'(a)$ through the point $(a,f(a))$:
\begin{align*}
f'(a) \cdot (x-a) + f(a) = y.
\end{align*}
If this linear equation is a good approximation to our function, we might hope that a root of this line is close — or at least closer – to a root of $f(x)$ than our original guess $a$ was.  If we solve for its root, we get
\begin{align*}
0 &= f'(a) \cdot (x-a) + f(a)\\
\Rightarrow x & = a - \frac{f(a)}{f'(a)},\\
\end{align*}
provided that $f'(a)$ is nonzero.  This value, hopefully, is a

at a root of $f(x)$, and iterating this process should lead us to a root of $f(x)$ itself!

A potentially new issue with this method is that there's no particularly clear reason why this process will eventually converge on a given point. For example, consider performing this process on the function $f(x) = x^4 +x^3 - 2x^2 - 1$ starting at $x=0$. The derivative of this function, $f'(x)$, is $4x^3 +3x^2 -4x$, which is $0$ at $x=0$. We can't even

.

Something of a problem, yes.  However, this isn't such a big problem: if we just wiggle our guesses a little bit, we'll wind up with nonzero derivatives (as zeroes of derivative are rather rare if the function we're studying is a nonconstant polynomial, say.)

A larger problem, even if we can get the method started and don't run into zero derivatives, is that it still might not converge on a given value.  Consider the function $x^3 - 2x + 2$, along with the starting point $x=0$:

Not exactly useful. If we graph this function, we can get a visual idea for why this pathological behavior happens:

Any function with tangent lines that form the

pattern below, where the zeroes of various tangent lines and the initial points of other tangent lines share the same

, will run into the same problem.  More complicated sequences that eventually repeat can be made using this idea.

So far, so bad.  However, these problems shouldn't make you give up all hope!  Here's an example where the situation is somewhat rosier:

Example
Approximate a root to $f(x) = x^4 - 4x^3 + 2x -1$, starting from $x=-1$.

Answer.
First, we note that $f'(x) = 4x^3 - 12x^2 + 2$.  If we apply Newton's method starting from $x=-1$, we get the following table of approximations:

Further approximations are useless, as we've already — after just four runs of our algorithm! — hit as many significant digits as we're using in our calculations. By comparison, it would have taken us 17 runs of the bisection method to have found an approximation as close as this one to a root, assuming we started with an interval of length 1.

Hopefully this is motivation for why we should try to understand this method more: while it's not foolproof, it is

fast where it does work.

So: we really

want this to work. How can we insure this happens — or at least know in what situations this method is viable?

Error Analysis

A more formal way of phrasing the question above is the following: take a polynomial $f(x)$, a starting point $a_0$, and a root $a$ of $f(x)$ that we're hoping that Newton's method will converge to. Let $a_1$ denote $a_0 - \frac{f(a_0)}{f'(a_0)}$, the result of iterating Newton's method once starting from $a_0$; in general, let $a_k$ denote the result of iterating Newton's method $k$ times starting from $a_0$. We want to understand how the

\begin{align*}
e_k := a - a_k
\end{align*}
behaves as $k$ grows large.  In particular, we would like to come up with conditions on $f(x)$ that insure that $e_k$ converges to 0, and furthermore get an estimate of how quickly $e_k$ converges to 0.

To start, we should attempt to describe an error $e_{k+1}$ in terms of earlier errors $e_k$, because these errors (like the approximations $a_k$ themselves) are iterative in design:

\begin{align*}
e_{k+1} &= a - a_{k+1}\\
&= a - \left( a_k - \frac{f(a_k)}{f'(a_k)} \right)\\
&= (a - a_k) +\frac{f(a_k)}{f'(a_k)} \\
&= e_k + \frac{f(a_k)}{f'(a_k)}. \\
\end{align*}

This tells us, somewhat, how to express how our errors are changing over time: we're adding $\frac{f(a_k)}{f'(a_k)}$ at each step.  But this isn't necessarily very useful; we don't really know what this quantity is!  We want to express $\frac{f(a_k)}{f'(a_k)}$ in terms of these errors as well.

To do this, first expand[Typically, we write polynomials in the form $\sum_{i=1}^n b_ix^i$, for some constants $b_i$.  However, this is not the only way to write a polynomial!  For example, we can write $x^2 - 5x + 7$ as $(x-2)^2 + (x^2) + 1$.  In general, given any $a$, we can write a given polynomial in the form $\sum_{i=1}^n c_i (x-a)^i$, by picking appropriate constants.  If you haven't seen how to do this, come find me and I'll explain!] our polynomial $f(x)$ around the approximation $a_k$: in other words, find constants $b_0, \ldots b_n$ such that
\begin{align*}
f(x) = b_0 + b_1 \cdot (x-a_k) + b_2 \cdot (x-a_k)^2 + \ldots + b_n \cdot (x-a_k)^n.
\end{align*}

Because $a$ is a root of $f(x)$, we know that $f(a) = 0$.  On the other hand, though, if we use our expansion, we have that
\begin{align*}
f(a) &= b_0 + b_1 \cdot (a-a_k) + b_2 \cdot (a-a_k)^2 + \ldots + b_n \cdot (a-a_k)^n \\
&= b_0 + b_1 \cdot e_k + b_2 \cdot e_k^2 + \ldots b_n \cdot e_k^n.
\end{align*}

What are these $b_i$'s?  Well: $b_0$, in a sense, is what happens when we plug $a_k$ into our polynomial, because
\begin{align*}
f(a_k) = b_0 + b_1 \cdot (a_k-a_k) + b_2 \cdot (a_k-a_k)^2 + \ldots + b_n \cdot (a_k-a_k)^n = b_0.
\end{align*}

In other words, $b_0 = f(a_k)$.  Similarly, $b_1$ is just what we get when we plug $a_k$ into the derivative of our function, as
\begin{align*}
f'(x) &= \frac{d}{dx}\left(b_0 + b_1 \cdot (x-a_k) + b_2 \cdot (x-a_k)^2 + \ldots + b_n \cdot (x-a_k)^n\right)\\
&=  b_1 \cdot \left( \frac{d}{dx} \left( x- a_k\right) \right) + b_2 \cdot \left( \frac{d}{dx} \left( x- a_k\right)^2 \right) + \ldots + b_n \cdot \left( \frac{d}{dx} \left( x- a_k\right)^n \right)\\
&= b_1 \cdot \cdot \left(1 \right) +  b_2 \cdot \left( 2\left( x- a_k\right) \right) + \ldots + b_n \cdot \left( n\left( x- a_k\right)^{n-1} \right)\\
\Rightarrow f'(a_k) &= b_1 \cdot \cdot \left(1 \right) +  b_2 \cdot \left( 2\left( a_k- a_k\right) \right) + \ldots + b_n \cdot \left( n\left( a_k - a_k\right)^{n-1} \right)\\
&= b_1.\\
\end{align*}
So $f'(a_k) = b_1$.  Via a similar process, you can show (and should if you're skeptical!) that $f''(a_k) = 2b_2$.

Why do we care?  Well: we've now created an expression relating the error terms $e_k$ to the function $f(x)$ and its derivatives at $a_k$!  In particular, we've shown that
\begin{align*}
f(a) &= b_0 + b_1 \cdot e_k + b_2 \cdot e_k^2 + \ldots b_n \cdot e_k^n.\\
&= f(a_k) + f'(a_k) \cdot e_k + \frac{f''(a_k)}{2} \cdot e_k^2 + O(e_k^3).\\
\end{align*}

Here, $O(e_k^3)$ is a way of collecting together all of the terms that have three or more factors of $e_k$ in them. The idea is that if $e_k$ is something relatively small, then the terms $b_3e_k^3, b_4e_k^4, \ldots$ will all be really really small as compared to the $f(a_k) + f'(a_k) \cdot e_k + 2f''(a_k) \cdot e_k^2$ factors, as they all contain additional multiples of $e_k$, a very small thing!  Often in the manipulations below, we'll wind up multiplying or dividing this quantity by constants.  When we do this, we won't actually write these constants down by the $O(e_k^3)$ term: i.e. we'll do things like write $2 \cdot O(e_k^3) = O(e_k^3)$.  This is because

So: as we noted earlier, because $a$ is a root of $f(x)$, we have $f(a) = 0$.  If we apply this to the expression above, we have

\begin{align*}
0 = f(a) &= f(a_k) + f'(a_k) \cdot e_k + 2f''(a_k) \cdot e_k^2 + O(e_k^3)\\
\Rightarrow -f(a_k) &= f'(a_k) \cdot e_k + 2f''(a_k) \cdot e_k^2 + O(e_k^3)\\
\Rightarrow -\frac{f(a_k)}{f'(a_k)} &= e_k + \frac{f''(a_k)}{2f'(a_k)}e_k^2 + O(e_k^3).\\
\end{align*}

If we plug this into our earlier expression $e_{k+1} = e_k + \frac{f(a_k)}{f'(a_k)}$ we get

\begin{align*}
e_{k+1} &= e_k + \frac{f(a_k)}{f'(a_k)}\\
&= e_k - \left( e_k + \frac{f''(a_k)}{2f'(a_k)}e_k^2 + O(e_k^3) \right)\\
&= -\frac{f''(a_k)}{2f'(a_k)}e_k^2 + O(e_k^3)\\
\Rightarrow |e_{k+1}| &= \left|\frac{f''(a_k)}{2f'(a_k)}e_k^2 + O(e_k^3) \right|. \\
\end{align*}

So.  Suppose for the moment that on the entire region we make guesses in, we satisfy the following properties:

$f'(x)$ is bounded away from 0.
$f''(x)$ is bounded away from infinity.

As a consequence of these first two properties, we can bound $\frac{f''(a_k)}{2f'(a_k)}$ above by some constant $C$ on the interval we're studying.

We also ask for the following two other properties:

Our starting point $a_0$ is
to the root $a$. By
we mean that $a_0$ is sufficiently close to $a$ such that
- Our $O(e_0^3)$ terms are smaller than $Ce_0^2$ term. This stops them from accidentally
  our polynomial, and insures that the $\frac{2f''(a_k)}{f'(a_k)}e_k^2$ term is the relevant one to pay attention to.
- $2Ce_0 < 1$. This insures that our errors don't increase when we iterate Newton's method.

If we use our third property, we can see that our initial guess $a_0$ insures that the quantity $\left|-\frac{2f''(a_0)}{f'(a_0)}e_0^2 + O(e_0^3) \right|$ is bounded above by $ 2C|e_0^2|$. Consequently, $e_1 \leq 2Ce_0^2 < e_0$. As a result, $e_1$ satisfies the two properties 3a, 3b as well; so by the same logic, we have $e_2 \leq 2Ce_1^2$. Induction tells us that this property holds for

$e_k$: i.e.
\begin{align*}
e_{k+1} \leq 2Ce_{k}^2.
\end{align*}

This, by the way, is

To get an idea of how amazing this is, think about how quickly errors improved in the method of bisections. If our error was $e_k$ at one stage, the error at the next stage was $e_k/2$, because our algorithm runs by repeatedly subdividing intervals. In a sense, this improvement is

in terms of the total number of bits gained: at each step, we gain precisely one binary bit of additional information on our root.

Newton's method, by contrast, is

in terms of the total number of bits gained. If we have an estimate that's accurate to within $10^{-5}$ of a root, running Newton just one more time gives us an improvement of $2C \cdot 10^{-5}$ on our original error margin! In other words, up to the constant multiple on the outside, we're effectively

the number of bits of accuracy on our estimate with each step. That is

.

And we've discovered reasonable conditions to check for to insure this happens! If we want to use Newton's method on an interval to find a root, we just need to insure that the following holds:

$f'(x)$ is bounded away from 0 on our interval. If our root $a$ is simple, i.e. $(x-a)^2$ is not a factor of $f(x)$, this is easy to insure by simply taking a sufficiently close interval around the root we're looking for. This is because the derivative is nonzero near such a root, as we repeatedly observed in our lecture yesterday! (If our root is not simple, i.e. $(x-a)^2$ is a factor of $f(x)$, Newton's method actually is pretty slow, and we're much better off with other methods.)
$f''(x)$ doesn't blow up to infinity, which as long as we're working with polynomials is free.
$a_0$ is
to $a$, as described above: these are relatively easy conditions to check and insure by simply shrinking our interval (with i.e. the method of bisections or Sturm's theorem) until they hold.

Now that we understand when Newton's method is guaranteed to converge, we can finally formalize it below:

Root-Finding Algorithm 4: Newton's Method

[Input:] A continuous function $f(x)$ and an error tolerance $\epsilon$. As well, an interval $[c,d]$ and a guess $a_0$ that satisfy the
conditions listed above. Let $C$ be the constant defined earlier, and request that $d-c < \frac{1}{2C}$ as well.
Let $k$ be the current number of passes we've made using Newton's method, and $e_k$ be the error defined in our discussions above. Initially, we won't know what the root we're converging to is, but we can bound it above by $d-c$, which we know is less than $\frac{1}{2C}$. Using the same iterated argument as before, we can still conclude that our errors follow the $e_{k+1} \leq 2Ce_{k}^2$ property.
If $e_{k} < \epsilon$, halt: we have an approximation that's within $\epsilon$ of a root.
Otherwise, set $a_{k+1} = a_k - \frac{f(a_k)}{f'(a_k)}$, and return to step 2.

This, effectively, is how most modern root-finding programs actually find roots. Using this in conjunction with Sturm's theorem (to identify all the roots) and the bisection method (to improve our guesses to the point where Newton takes over) is all you need to construct fast, beautiful root-finding programs — with these three classes, you're now capable of (mostly) replicating the fastest algorithms we use today. Cool, right?

Homework 3: Newton's Method

Use Newton's method to find a root of $x^6 -x^3 + 4x - 1$, starting at $1$.
In class, we studied an example of Newton's method where our process kept looping between two values. (In particular, $x^3 - 2x + 2$, starting at $x=0$, kept bouncing between $x=0$ and $x=1$.) Find a function $f(x)$ and a starting point $a$ such that applying Newton's method starting from $a$ enters a loop between three values.
Find a function $f(x)$ and a starting point $a$ so that applying Newton's method to $f(x)$ starting from $a$ yields a sequence of approximations that diverges to infinity.
Is it possible to find a
$p(x)$ and starting point $a$ so that applying Newton's method to $p(x)$ starting at $a$ yields a sequence of approximations that diverges to infinity? Or must any such sequence be bounded?
Consider the following algorithm, the
, which is in some senses a
form of Newton's method:
- [Input:] A continuous function $f(x)$ and a pair of guesses $a_0, a_1$ for a root of $f(x)$.
- To form a third guess $a_2$, draw the line between $(a_0, f(a_0))$ and $(a_1, f(a_1))$. This line has slope $\frac{f(a_1) - f(a_0)}{a_1 - a_0}$, and goes through the point $(a_1, f(a_1))$. Therefore, it has the form
  \begin{align*}
  y = \frac{f(a_1) - f(a_0)}{a_1 - a_0}(x-a_0) + f(a_0).
  \end{align*}
- Solving for the root of this line tells us that our third guess, $a_2$, is precisely
  \begin{align*}
  a_2 := a_1 - f(a_1)\frac{a_1 - a_0}{f(a_1) - f(a_0)}.\\
  \end{align*}
- We now have a new third guess $a_2$! Take $a_1$ and $a_2$, our most recent two guesses, and repeat this process on them to generate another guess $a_3$. Iterate to generate a sequence $a_k$ of guesses.
Perform error analysis on this algorithm in a similar fashion to the one that we did for Newton's method. If $a$ is the root that the sequence $a_k$'s is hopefully converging to, and $e_k = a - a_k$, show (with some assumptions about what
means) that $|e_{k+1}| \leq C |e_k|^\varphi$. Note that $\varphi$, here, is the golden ratio $\frac{1+ \sqrt{5}}{2}$.

Day 2: Sturm's theorem

hbghlyj · 2024-2-17 23:05

Last edited by hbghlyj 2025-3-22 22:56 $type

Constructive Mathematics (2019-2020).html (61.99 KB, Downloads: 120)
这个讲义以"对不动点法作优化"的方式(使其在单根附近收敛速度是二次的)引入牛顿法.

hbghlyj · 2024-2-17 23:08

牛顿法的收敛速度在单根附近是二次的:math.stackexchange.com/questions/2685846/newt … d-point?noredirect=1

hbghlyj · 2024-2-17 23:10

MSE
$r$是$f(x)=0$的一个二重根,即$f(r)=f'(r)=0,f''(r)≠0$,假设$f''$在$r$的一个邻域内是连续的,证明,牛顿法的误差$e_n$满足,$e_{n+1} ≈ \frac12 e_n$,从而得出,牛顿法在二重根附近的收敛速度是线性的.一般地,若$r$的重数为$n$,则$e_{n+1} ≈ \frac{m − 1}me_n$.

At a simple root of a sufficiently smooth $f$ you get quadratic convergence close to the root, that is $e_{n+1}\approx Ce_n^2$ if $e_n$ is small enough. At a multiple root or far away from a cluster of roots the convergence is linear, the worse the higher the multiplicity. You are to quantify this slow convergence.

Let $r$ be a root of multiplicity $m$. Then one can extract $m$ linear factors $(x-r)$ from $f$, so that $f(x)=(x-r)^mg(x)$, $g(r)\ne 0$, $g$ at least differentiable. Then $$f'(x)=m(x-r)^{m-1}g(x)+(x-r)^mg'(x)$$ and the Newton step gives
$$
x_{n+1}-r=x_n-r-\frac{(x_n-r)^mg(x_n)}{m(x_n-r)^{m-1}g(x_n)+(x_n-r)^mg'(x_n)}
\\~\\
=\frac{(m-1)g(x_n)+(x_n-r)g'(x_n)}{mg(x_n)+(x_n-r)g'(x_n)}(x_n-r)
$$
which implies, using $g(x_n)=g(r)+g'(r)e_n+...$
\begin{align*}
e_{n+1}
&=\frac{(m-1)g(r)+me_ng'(r)+O(e_n^2)}{mg(r)+(m+1)e_ng'(r)+O(e_n^2)}e_n
\\[1em]
&=\frac{m-1}{m}\frac{m(m-1)g(r)+m^2e_ng'(r)+O(e_n^2)}{m(m-1)g(r)+(m-1)(m+1)e_ng'(r)+O(e_n^2)}e_n
\\[1em]
&=\frac{m-1}{m}\left(1+\frac{g'(r)+O(e_n)}{m(m-1)g(r)+O(e_n)}e_n\right)e_n
\\[1em]
&=\frac{m-1}{m}e_n+\frac{g'(r)}{m(m-1)g(r)}e_n^2+O(e_n^3)
\end{align*}
which should lead directly to the claim of your task.

hbghlyj · 2024-2-17 23:28

在Calculus and analytic geometry的282页给了一个条件但没有证明:\begin{equation}\label{d}\left|\frac{f(x) f''(x)}{\left[f'(x)\right]^{2}}\right|<1\end{equation}

如何证明这收敛条件？
我发现\eqref{d}可以写成$x-f(x)/f'(x)$的导数，因此相当于$x-f(x)/f'(x)$的导数绝对值小于1

hbghlyj · 2024-2-17 23:35

hbghlyj 发表于 2024-2-17 15:28
此相当于$x-f(x)/f'(x)$的导数绝对值小于1

如果导数的绝对值小于1那么它是收缩映射，所以根据不动点定理函数$x-f(x)/f'(x)$有不动点

Account		Remember me	Forgot password
Password			Register account

[函数] 教材牛顿法求方程近似解时为什么精度那样计算

Comment

Related threads

牛顿法的收敛速度在单根附近是二次的,在重根附近是线性的

Quick Reply