1.
Introduction
Consider the system of nonlinear equations
where F(x):Rn→Rn is a continuously differentiable function. In the paper, we assume that the solution set of (1.1) is nonempty and denote it by X∗. Moreover, we denote the Jacobian F′(x) as J(x) and use the notations Fk=F(xk),Jk=J(xk) for simplification.
The Levenberg-Marquardt (LM) method is one of the most important algorithms for solving (1.1). At every iteration, the LM method computes the trial step dk by solving the following linear system
where μk is the LM parameter which plays an important role in analyzing the convergence rate of the LM method. For example, Yamashita and Fukushima [13] proved that the LM method taking μk=‖Fk‖2 has quadratic convergence under the local error bound condition which is weaker than the nonsingularity. Fan and Yuan [6] proved that the LM method taking μk=‖Fk‖δ with δ∈[1,2] still achieves the quadratic convergence under the local error bound condition. More researches on the LM method can be found in [1,11,14,15,16] and references therein.
The LM method solves the linear system (1.2) exactly at every iteration which may be very expensive when solving a large-scale nonlinear equation. The inexact approach is one way to overcome this difficulty. In the inexact LM method, the direction dk is given by the solution of the system
where pk∈Rn is a perturbation vector which measures how inexactly the linear system (1.2) is solved. Under the nonsingularity, Facchinei and Kanzow [3] proved that if μk→0 and ‖pk‖≤o(‖JTkFk‖), then the inexact LM method has superlinear convergence rate and if μk=O(‖JTkFk‖) and ‖pk‖=O(‖JTkFk‖2), then its convergence rate is quadratic. Suppose
where α>0 and θ>0 are constants. Under the local error bound condition, many researchers (e.g., [2,4,5,7]) investigated the convergence rate of the inexact LM method for different values of α and θ respectively. Lately, Wang and Fan [12] studied the convergence rate of the inexact LM method taking μk=‖Fk‖α with ‖pk‖=‖Fk‖α+θ and μk=‖JTkFk‖α with ‖pk‖=‖JTkFk‖α+θ respectively under the H¨oderian local error bound condition and the H¨oderian continuity of the Jacobian, which are more general than the local error bound condition and the Lipschitz continuity of the Jacobian used in [1,2,4,5,7].
In this paper, we study the convergence rate of a family of inexact LM methods with more general LM parameters and perturbation vectors. We consider
where σ,τ∈[0,1] and α,θ>0. We derive an explicit formula of the convergence order under the H¨oderian local error bound condition and the H¨oderian continuity of the Jacobian. Moreover, we develop a family of inexact LM methods with a nonmonotone line search and prove its global convergence. We also investigate the numerical performances of these inexact LM methods by solving the nonlinear equations arising in the linear complementarity problem.
The organization of this paper is as follows. In the next section, we investigate the convergence order under the H¨oderian local error bound condition and the H¨oderian continuity of the Jacobian. In Section 3, we present a family of inexact LM methods with a nonmonotone line search and prove that it is globally convergent. In Section 4, we apply these inexact LM methods to solve the nonlinear equations arising from the linear complementarity problem and report some numerical results.
2.
Convergence rate of the inexact LM methods
In this section, we study the convergence rate of the inexact LM methods with the iteration
where dk is obtained by (1.3) with μk,pk satisfying (1.4) and (1.5). We suppose that the generated sequence {xk} converges to the solution set X∗ and lies in some neighbourhoods of x∗∈X∗.
Assumption 2.1. (a) F(x) provides a H¨oderian local error bound of order γ∈(0,1] in some neighbourhoods of x∗∈X∗, i.e., there exist constants κ>0 and 0<r<1 such that
(b) J(x) is H¨oderian continuous of order υ∈(0,1], i.e., there exists a constant L>0 such that
It is worth pointing out that if γ=υ=1, then Assumption 2.1 (a) is the local error bound condition and Assumption 2.1 (b) is the Lipschitz continuity of the Jacobian. Moreover, by (2.3), we have (see [12])
Furthermore, there exists a constant M>0 such that
In the following, we denote by ¯xk the vector in X∗ that is closest to xk, i.e.,
Now we suppose the singular value decomposition (SVD) of J(ˉxk) is
where ˉΣk,1=diag(ˉσk,1,...,ˉσk,r) with ˉσk,1≥⋯≥ˉσk,r>0. The corresponding SVD of Jk is
where Σk,1=diag(σk,1,...,σk,r) with σk,1≥⋯≥σk,r>0 and Σk,2=diag(σk,r+1,...,σk,n) with σk,r+1≥⋯≥σk,n≥0. For simplicity, we neglect the subscript k in Uk,i,Σk,i,Vk,i(i=1,2) and write J(ˉxk) and Jk as
By the matrix perturbation theory [8] and (2.3), we have
which gives
Moreover, by (1.3) we have
and
In the following, we suppose without loss of generality that xk lies in N(x∗,r2).
Lemma 2.1. Under the conditions of Assumption 2.1, if υ>2γ−2, then there exist positive constants a1,a2,b1,b2 such that
Proof. Since ‖¯xk−x∗‖≤‖¯xk−xk‖+‖xk−x∗‖≤2‖xk−x∗‖≤r, we have ¯xk∈N(x∗,r). Hence, by (2.2) and (2.5),
By (2.5), we have
Let Tk:=Fk−F(¯xk)−Jk(xk−¯xk). Then,
It follows from (2.4), (2.12) and (2.14) that
which together with υ>2γ−2 gives
where ˜c>0 is some constant. By (2.13) and (2.15), we have
Since μk=σ‖Fk‖α+(1−σ)‖JTkFk‖α, by (2.12) and (2.16), we have
where a1:=σκαγ+(1−σ)˜cα and a2:=σMα+(1−σ)M2α, which together with 2γ−1≥1γ gives
This proves (2.10). Moreover, since ‖pk‖=τ‖Fk‖α+θ+(1−τ)‖JTkFk‖α+θ, according to (2.12) and (2.16), we have
where b1:=τκα+θγ+(1−τ)˜cα+θ and b2:=τMα+θ+(1−τ)M2(α+θ), which together with 2γ−1≥1γ gives
This proves (2.11). □ □
Lemma 2.2. Under the conditions of Assumption 2.1, if υ>2γ−2, 0<α<2γ(1+υ)2−γ and θ>(2−2γ)αγ, there exists a constant c>0 such that
Proof. Let
Then ˉdk is the LM step computed by solving (1.2). Moreover, by (1.3) we have
Now we define
Then, the LM step ¯dk defined by (2.18) is the minimizer of φk(d). By (2.4) and the left inequality in (2.10), we have
where c1:=((L/(1+υ))2a−11+1). Thus, by (2.19), (2.21) and the left inequality in (2.10) and the right inequality in (2.11), we have
where c=√c1+b2/a1.□ □
Lemma 2.3. [12, Lemma 2.3] Under the conditions of Assumption 2.1, we have
(i)‖U1UT1Fk‖≤M‖¯xk−xk‖; (ii)‖U2UT2Fk‖≤2L‖¯xk−xk‖1+υ.
Theorem 2.1. Under the conditions of Assumption 2.1, if υ>2γ−2, 0<α<2γ(1+υ)2−γ and θ>(2−2γ)αγ, then the sequence {xk} converges to the solution set X∗ with the order h(α,θ,γ,υ) where
Proof. Since xk converges to X∗, we assume that L‖¯xk−xk‖v≤ˉσ2 holds for all sufficiently large k. Then, it follows from (2.7) that
On the other hand, by (2.7) and the left inequality in (2.10), for all sufficiently large k,
Hence, it follows from (2.9)–(2.11), (2.24), (2.25), ‖(Σ22+μkI)−1‖≤μ−1k and Lemma 2.3 that
where ˉc=4Ma2/ˉσ2+2L+2ˉσb2+La−11b2. Therefore, by (2.2), (2.4), (2.26) and Lemma 2.2, we have
where h(α,θ,γ,υ) is given in (2.23). □ □
Corollary 2.1. Under the conditions of Assumption 2.1 and γ=υ=1, if 0<α<4, then the sequence {xk} converges to the solution set X∗ with the order h(α,θ) where
More precisely,
and
As we know from [5] that for any α∈(0,2] and θ≥1, the sequence generated by the inexact LM method (1.3) converges to some solution of (1.1), which is a stronger result than the convergence to the solution set. We show that this convergence result also holds true for the inexact LM methods studied in this paper.
Theorem 2.2. Under the conditions of Assumption 2.1 and γ=υ=1, if α∈(0,2] and θ≥1, then the sequence {xk} converges to a solution of (1.1) with the order g(α) where
Proof. By Corollary 2.1, when α∈(0,2] and θ≥1, it holds that
where g(α) is defined by (2.27). Then, for all sufficiently large k,
which together with g(α)>1 gives
Hence, we deduce from (2.17), (2.28) and (2.29) that
This implies that the sequence {xk} converges to some solution of (1.1) with the order g(α). □ □
3.
Globally convergent inexact LM methods
In this section, we study a family of globally convergent inexact LM methods. We define the merit function ψ:Rn→R as
Obviously, ψ(x) is continuously differentiable at any x∈Rn with ∇ψ(x)=J(x)TF(x). Our method is described as follows.
Algorithm 3.1. Choose parameters σ,τ∈[0,1], α∈(0,4], θ>0, ρ,ξ,χ,δ,ζ∈(0,1) and x0∈Rn. Let Θ0:=ψ(x0). Set k:=0.
Step 1: If ‖∇ψ(xk)‖=0, then stop.
Step 2: Set
Find a search direction dk∈Rn which satisfies
where
If dk satisfies
then set λk:=1 and go to Step 4.
Step 3: If the descent condition
is not satisfied, then set dk:=−∇ψ(xk). Let lk be the smallest nonnegative integer l satisfying
Set λk:=δlk and go to Step 4.
Step 4: Set xk+1:=xk+λkdk and
Set k:=k+1 and go to Step 1.
Algorithm 3.1 is designed based on the inexact LM method [2] and the nonmonotone smoothing Newton method [10]. The main feature of Algorithm 3.1 is that it takes more general LM parameter μk and perturbation vector pk and adopts a nonmonotone line search technique to ensure the global convergence.
Theorem 3.1. Algorithm 3.1 generates an infinite sequence {xk} which satisfies ψ(xk)≤Θk for all k≥0.
Proof. For some k, we assume that ψ(xk)≤Θk. If ∇ψ(xk)≠0, then F(xk)≠0 and hence μk>0. So, the matrix JTkJk+μkI is positive definite and the search direction dk in Step 2 is always obtained. Notice that the obtained dk≠0. In fact, if dk=0, then by (3.4) we have ‖pk−∇ψ(xk)‖=0. Since ‖pk‖≤ρ‖∇ψ(xk)‖, it follows that ‖∇ψ(xk)‖=‖pk‖=0 which contradicts ∇ψ(xk)≠0. So, in Step 3, if the descent condition (3.7) holds, then ∇ψ(xk)Tdk≤−χ‖dk‖2<0. Otherwise, dk=−∇ψ(xk) which gives ∇ψ(xk)Tdk=−‖∇ψ(xk)‖2<0. Thus, the direction dk used in the line search (3.8) is always a descent direction of ψ. Next we show that there exists at least a nonnegative integer l satisfying (3.8). On the contrary, we suppose that for all nonnegative integer l,
which together with ψ(xk)≤Θk gives
By letting l→∞ in (3.10), we have ∇ψ(xk)Tdk≥0 which contradicts ∇ψ(xk)Tdk<0. So, we can obtain xk+1 in Step 4. Now we show ψ(xk+1)≤Θk+1. In fact, if the condition (3.6) holds, then
Otherwise, by Step 3, we also have ψ(xk+1)≤Θk. Hence, from (3.9) it holds that
Therefore, we can conclude that if ψ(xk)≤Θk and ∇ψ(xk)≠0 for some k, then xk+1 can be generated by Algorithm 2.1 with ψ(xk+1)≤Θk+1. Since ψ(x0)=Θ0, by induction on k, we prove the theorem. □
Theorem 3.2. Every accumulation point x∗ of a sequence {xk} generated by Algorithm 3.1 is a stationary point of ψ(x), i.e., ∇ψ(x∗)=0.
Proof. By Steps 2 and 3, we have ψ(xk+1)≤Θk for all k≥0. This and (3.9) yield that
Thus there exists Θ∗≥0 such that limk→∞Θk=Θ∗. Further, by (3.9) we have
and so
So, if there are infinitely many k for which ‖F(xk+dk)‖≤ξ‖F(xk)‖ holds, then √2Θ∗≤ξ√2Θ∗ which together with ξ∈(0,1) yields Θ∗=0, i.e., limk→∞F(xk)=0. By the continuity, we have the desired result. Next, we assume that there exists an index ˉk such that ‖F(xk+dk)‖>ξ‖F(xk)‖ for all k≥ˉk, i.e., λk is determined by (3.8) for all k≥ˉk. Since x∗ is the accumulation point of {xk}, there exists a subsequence {xk}k∈K where K⊂{0,1,...} such that lim(K∋)k→∞xk=x∗. We assume that ∇ψ(x∗)≠0 and will derive a contradiction. Since ∇ψ(x∗)=J(x∗)TF(x∗)≠0, we have ‖J(x∗)‖>0 and ‖F(x∗)‖>0. Moreover, by the continuity, we have
Obviously, μ∗>0. So, there exists a positive constant ˉμ such that μk≥ˉμ>0 for all k∈K. Since {∇ψ(xk)} is bounded on any convergent subsequence {xk}k∈K, for any k∈K, either
or ‖dk‖=‖−∇ψ(xk)‖<∞. Hence, the sequence {‖dk‖}k∈K is bounded. By passing to the subsequence, we suppose lim(K1∋)k→∞dk=d∗ where K1⊂K is an infinite subset. In the following, we prove ∇ψ(x∗)Td∗=0. By (3.8) we have
This and limk→∞ψ(xk)=limk→∞Θk=Θ∗ yield limk→∞λk‖dk‖=0. Hence, if λk≥ˉλ>0 for any k∈K1 where ˉλ>0 is a constant, then lim(K1∋)k→∞dk=d∗=0 and hence ∇ψ(x∗)Td∗=0. Otherwise, {λk}k∈K1 has a subsequence converging to zero and we suppose lim(K2∋)k→∞λk=0 where K2⊂K1 is an infinite set. From (3.8), for all k≥ˉk and k∈K2,
which gives
Since ψ is continuously differentiable at x∗, by letting k→∞ with k∈K2 in (3.12), we have ∇ψ(x∗)Td∗≥0. On the other hand, since dk is a sufficient descent direction of ψ, we have ∇ψ(x∗)Td∗=lim(K2∋)k→∞∇ψ(xk)Tdk≤0. These give ∇ψ(x∗)Td∗=0. Hence, we can conclude that ∇ψ(x∗)Td∗=0. Let ˉK:={k∈K1|dk=−∇ψ(xk)}. If ˉK is an infinite set, then
which contradicts the assumption ∇ψ(x∗)≠0. Otherwise, ˉK is a finite set and dk satisfies (3.7) for all sufficiently large k∈K1. Then, by (3.7) we have
which gives d∗=0. By (3.4), we have for all k∈K1,
Since lim(K1∋)k→∞(JTkJk+μkI)=J(x∗)TJ(x∗)+μ∗I, by (3.13) and d∗=0, we have
Since ‖pk‖≤ρ‖∇ψ(xk)‖, we can deduce from (3.14) that
which also contradicts the assumption ∇ψ(x∗)≠0. We complete the proof. □
Next, we analyze the convergence rete of Algorithm 3.1. Suppose that the generated iteration sequence {xk} has an accumulation point x∗ such that F(x∗)=0 and Assumption 2.1 holds at x∗ for γ=υ=1. We will show that the whole sequence {xk} converges to x∗ at leat superlinearly for any α∈(0,2] and θ>1.
Lemma 3.1. Assume that Assumption 2.1 holds for γ=υ=1. Let α∈(0,2] and θ>1. If xk,xk+dk∈N(x∗,r/2), then there exists ˆc>0 such that
Proof. Since ¯dk defined by (2.18) is the minimizer of φk(d) in (2.20), by (2.4) and (2.10),
It holds from (2.20) and (3.16) that
Since dk=ˉdk+(JTkJk+μkI)−1pk, we have from (2.5), (2.10), (2.11) and (3.17) that
where ˜C=√L2/4+a2+Mb2a−11. Moreover, by (2.17), α∈(0,2] and θ>1, we have
Thus, by (2.4), (2.17), (3.18) and (3.19), we have
which together with (2.2) gives
Letting ˆc:=(˜C+Lc2/2)κ−1, we complete the proof. □ □
Lemma 3.2. Under the same conditions in Lemma 3.1, there exists an index ˉk such that for all k≥ˉk it holds: (i) xk,xk+dk∈N(x∗,r/2); (ii) ‖F(xk+dk)‖≤ξ‖F(xk)‖.
Proof. By Lemma 3.1, similarly as the proof of [9, Lemma 11], we can prove the result. □□
Theorem 3.3. Under the same conditions in Lemma 3.1, the whole sequence {xk} converges to x∗ with
Proof. By Lemma 3.2 and Step 2 of Algorithm 3.1, we have xk+1=xk+dk and ‖F(xk+1)‖≤ξ‖F(xk)‖ for all k≥ˉk. It follows that limk→∞‖F(xk)‖=0, which together with (2.2) yields limk→∞dist(xk,X∗)=0. Thus, from Lemma 3.1, for all sufficiently large k,
Since min{α2+1,θ}>1 and limk→∞dist(xk,X∗)=0, we have ˆcdist(xk,X∗)min{α2+1,θ}−1<12 for all sufficiently large k. It follows that for all sufficiently large k,
This implies that for all sufficiently large k,
that is
By (3.19) and Lemma 3.1, for all sufficiently large k,
which together with limk→∞dist(xk,X∗)=0 gives limk→∞dk=0 and
By (3.20) and (3.22), for all sufficiently large k,
So, when k is sufficiently large, (3.23) gives
This and limk→∞dk=0 yield limk→∞xk=x∗. Further, by (3.21) and (3.24) we have
The proof is completed. □ □
4.
Numerical results
We apply Algorithm 3.1 to solve the nonlinear equations arising in the well-known linear complementarity problem (LCP):
where M∈Rn×n and q∈Rn are given matrix and vector. To reformulate the LCP into an equivalent system of equations, we define the function ϕ:R2→R as
where sgn(t):={1ift>00ift=0.−1ift<0
Proposition 4.1. (i) ϕ(a,b)=0⟺a≥0,b≥0,ab=0.
(ii) ϕ is continuously differentiable at any (a,b)∈R2 whose gradient is given by
Proof. Let f(t):=sgn(t)t2. Since fq is a bijective function, it follows that
The result (ⅱ) holds because f(t) is continuously differentiable everywhere with f′(t)=2|t|. □
Let x:=(u,v). By using the function ϕ, we may have that solving the LCP is equivalent to computing a solution of the smooth nonlinear equations
In the following, we apply Algorithm 3.1 to solve the nonlinear equations (4.3). The parameters are chosen as ρ=10−3,ξ=0.5,γ=10−5,ζ=10−5,δ=0.8,θ=1,τ=0.5 and σ,α are given the specific experiments. In Step 2, GMRES is used as the linear solver to find the inexact direction dk. Moreover, we use ‖F(xk)‖≤10−5 as the stopping criterion.
We test two classes of LCPs defined as follows:
LCP (ⅰ) Let M be the block diagonal matrix with NT1N1‖NT1N1‖,...,NT4N4‖NT4N4‖ as block diagonals, i.e., M=diag(NTiNi‖NTiNi‖) with Ni=rand(n4,n4) for i=1,...,4. Take q=rand(n,1). Obviously, the matrix M is positive semidefinite.
LCP (ⅱ) Let M=diag(Ni‖Ni‖−eye(n/4)) with Ni=rand(n4,n4) for i=1,...,4. Take q=rand(n,1).
We use v0=(1,0,...,0)T and u0=Mv0+q as the initial point. Tables 1 and 2 show numerical experimental results of Algorithm 3.1 with different values of σ and α, in which IT denotes the iteration number, CPU denotes the CPU time in seconds, Fx denotes the value of ‖F(xk)‖ at the final iteration point and "–" stands for that the algorithm fails to find the solution. These numerical results show that Algorithm 3.1 is efficient for solving LCPs. It can find a solution point meeting the desired accuracy in very few iteration numbers and in short CPU time. Moreover, from our numerical implementations, we may find that Algorithm 3.1 with σ=1, i.e., μk=‖Fk‖α, has advantage over it with σ=0, i.e., μk=‖JTkFk‖α. At last, we point out that we have tested Algorithm 3.1 with different values of τ and found that the numerical performances are same. □
5.
Conclusions
We have presented a family of inexact Levenberg-Marquardt methods for the nonlinear equations. The presented LM method takes more general LM parameters and perturbation vectors which are convex combinations of ‖Fk‖α and ‖JTkFk‖α and ‖Fk‖α+θ and ‖JTkFk‖α+θ. Under the H¨oderian local error bound condition and the H¨oderian continuity of the Jacobian, we have derived an explicit formula of the convergence order of these inexact LM methods. Moreover, we have developed a family of globally convergent inexact LM methods and showed its effectiveness by some numerical experiments.
Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
Acknowledgments
Research of this paper was supported by Natural Science Foundation of Henan Province (222300420520) and Key scientific research projects of Higher Education of Henan Province (22A110020).
Conflict of interest
All authors declare no conflicts of interest in this paper.