1.
Introduction
Consider the following unconstrained optimization problem
where f(x): Rn→R is a continuous differentiable function. The problem has penetrated deeply into various fields, such as aerospace, engineering technology, economics and finance, etc [1,2,3]. The trust region method is an important method for solving (1.1) and it has attracted the attention of many researchers [4,5,6]. The trust region methods usually compute a trial step sk by solving the following quadratic subproblem
where
Bk∈Rn×n is the Hessian matrix of the function at the current iteration point xk or its symmetric approximation, s is the trial step, Δk>0 is the trust region radius, and ‖⋅‖ stands for the Euclidean norm. The trust region methods take the ratio of the actual reduction to the predicted reduction
to decide whether to accept the trial step and how to adjust the trust region radius. If rk is close to 1, the trial step sk should be accepted and Δk can be increased. If rk is too small, the trial step sk should be rejected, Δk should be decreased, and the subproblem (1.2) should be resolved. If rk is much larger than 1, the case of 'too successful iteration' might occur. In addition, the trust region methods have always been accepted as effective methods for dealing with small and medium scale optimization problems due to the cost of computation and storage on the matrix B−1k at each iteration. Many researchers [7,8,9,10,11,12,13,14,15,16] considered the modification of the trust region methods to adapt large scale optimization. We devote to the construction of subproblem and the adjustment of the trust region radius.
In recent years, some trust region methods [11,12,13,14,15] based on simple models for solving large-scale optimization problems were proposed. For example, Sun et al. [13] developed a nonmonotone trust region algorithm with simple quadratic models, in which Hessian matrix in the subproblem is a diagonal positive definite matrix. Li et al. [14] proposed a simulated annealing-based trust region Bazilai-Borwein (BB) method and [15] proposed nonmonotone trust region BB methods. They all used scalar matrix with the reciprocal of the BB-stepsize to approximate the Hessian matrix of the objective function f(x). In the above methods, the amount of computation and storage is greatly reduced.
The matrix Bk in subproblem (1.2) usually satisfies the classic secant equation (see [17])
where
Ebadi et al. [18] provided two new secant equations
where
so
where
Constructing the approximation of the Hessian matrix based on the formulas (1.4) and (1.5) can make the algorithm maintain the third order curvature information of the objective function at the current iteration point, and it can make use of both gradient and function values and information from the three most recent points. It would improve the efficiency of the algorithm, which has attracted our attention. We try to introduce these two secant equations into the trust region algorithm.
It is usually difficult to satisfy the secant Eq (1.3) with a nonsingular scalar matrix. Many researchers [16,19,20] considered some alternative conditions that could maintain the accumulated curvature information along the negative gradient. For example, Dennis and Wolkowicz [20] introduced a weaker form by projecting the secant Eq (1.3) in the direction sk as follows
Zhou et al. [16] considered some generalization of the weak secant Eq (1.6) and proposed a new simple model trust region method with generalized BB parameter. Inspired by the above work, we try to introduce two new weak secant equations with more information.
Updating strategy of the trust region radius may significantly affect the number of iterations. Many researchers proposed adaptive trust region methods [21,22,23,24,25,26] to adjust the trust region radius. For example, Zhang et al. [24] proposed an adaptive trust region radius
where c∈(0,1), p is a nonnegative integer, and
is a positive definite matrix, for some i∈N. Under the same parameters, Shi et al. [25] proposed another adaptive trust region radius
with the vector parameter qk∈Rn satisfying the angle condition
where o∈(0,1). Rezaee and Babaie-Kafaki [26] proposed an adaptive choice for the trust region radius based on an eigenvalue analysis conducted on the scaled memoryless quasi-Newton updating formulas
where
The adaptive trust region radius does not use the Hessian matrix explicitly, and comparing the trust region method with the adaptive radius to some adaptive trust region methods, this method is low-cost to update the trust region radius.
Note that some trust region methods require monotone reduction of the objective function, which may slow the convergence rate in the presence of a narrow curved valley. Nonmonotone trust region methods [14,27,28,29,30] were proposed. Li et al. [14] proposed a simulated annealing-based trust region BB method. Their nonmonotone strategy was defined by a modified Metropolis criterion, which can dynamically control the acceptance probability of the solution of the subproblem by introducing adaptive parameters (such as temperature parameters) into the accept-reject strategy. In the early stage of iteration, a higher acceptance probability can help the algorithm jump out of the local optimal solution by a larger extent, while in the later stage of iteration, the acceptance probability was gradually reduced to converge to a better solution.
Our research aims to propose an adaptive simple model trust region algorithm based on new weak secant equations for solving large-scale optimization problems. The contributions of our work are listed as follows:
∙ Two new weak secant equations are introduced, which make use of both gradient and function values and utilize information from the three most recent points. A simple trust region subproblem is also constructed.
∙ In order to enable the algorithm to accept more trial steps, the nonmonotone strategy is defined by a modified Metropolis criterion.
∙ To overcome the case of "too successful iteration", adaptive strategy is introduced to adjust the trust region radius.
The rest of this paper is organized as follows. An adaptive simple trust region algorithm based on new weak secant equations is proposed in the next section. In Section 3, the global convergence and locally superlinearly convergence of the new algorithm are established under mild assumptions. Section 4 introduces numerical experiments to prove the effectiveness of the algorithm. Conclusions are made in the last section.
2.
The proposed method
In this section, two new weak secant equations are introduced based on the formulas (1.4) and (1.5). On this basis, a simple trust region subproblem is constructed and a new trust region method for solving large-scale optimization problems is proposed.
Based on the formulas (1.4) and (1.5), we introduce the following new weak secant equations
where ¯sk, ¯yk, and zk are the same as formulas (1.4) and (1.5). Let the matrix Bk in subproblem (1.2) be a scalar matrix γkI satisfying (2.1) or (2.2), then we get
or
A simple trust region subproblem could be constructed as follows
Suppose ‖gk‖≠0. The solution of subproblem (2.5) is given by:
(i) If
then
(ii) If
then
The ratio rk can be rewritten as
In our algorithm, the following modified Metropolis criterion is used to determine whether to accept the trial step
where Tk is the temperature at the k-iteration, 0<τ<1 is a sufficiently small real number, and the temperature Tk decreases to 0 as k→∞. The modified Metropolis criterion is embedded into the trust region methods. Combine pk and rk to determine whether the algorithm is iterative. Different from the traditional trust region algorithm, when rk≤τ, the modified Metropolis criterion is used to accept more iterations with a certain probability, thereby reducing the amount of computation and improving the convergence rate of the algorithm.
Based on the above analysis, we propose an adaptive simple model trust region algorithm (ASMTR) with new weak secant equations.
ASMTR algorithm
Step 0. Set ε>0, β∈(0,1), u∈(0,1), 0<τ<u<1, v>1, c∈(0,1), p=0, 0<κ1≤κ2. Give x0∈Rn, compute g0, and set s0=−g0, x1=x0+s0. T1>0, Δ1>0, γ1=1, and k:=1.
Step 1. Compute gk. If ‖gk‖<ε, then stop.
Step 2. If
then
otherwise
Step 3. Compute rk and pk by (2.6) and (2.7), respectively, and let
If pk>lk, then
otherwise
Step 4. Compute γk+1 by (2.3) or (2.4). If γk+1≤κ1, set γk+1=κ1. If γk+1≥κ2, set γk+1=κ2.
Step 5. If rk>u, set p=0; Otherwise, set p=p+1. Compute Δk+1 by (1.7).
Step 6. Set Tk+1=βTk, k:=k+1, and go to Step 1.
Remark 2.1. The formula (1.7) is used to update the trust region radius, i.e.,
Remark 2.2. From Step 4 of the ASMTR algorithm, we know that the sequence {γk} is uniformly bounded, i.e.,
3.
Convergence analysis
In this section, we will discuss the convergence of the ASMTR algorithm. We would like to make the following assumptions:
Assumption (i) Let f: Rn→R be twice continuously differentiable and bounded below on the level set
for ∀x0∈Rn.
(ii) Let the gradient g(x) be uniformly continuous on a compact convex set so that Ω contains the level set L(x0).
Assumptions (i) and (ii) mean that ‖∇2f(x)‖ is continuous and uniformly bounded on Ω, so there exists a positive constant L such that
Therefore, from the mean value theorem, we have
which ensures that g(x) is Lipschitz continuous on Ω.
Lemma 1. Suppose that sk is the solution of subproblem (2.5) and the sequence {xk} is generated by the ASMTR algorithm. If ‖gk‖≠0, then
where δ1∈(0,1].
Proof. The proof is similar to the proof of Lemma 4.1 in [16]. □
Lemma 2. Suppose that Assumptions (i) and (ii) hold. The solution sk of the simple model (2.5) satisfies
Proof. According to the second-order Taylor expansion, we have
By the definition of m(k)(xk+s) in (2.5), we get
We complete the proof. □
Lemma 3. Suppose that Assumption (i) holds. Let the sequence {xk} be generated by the ASMTR algorithm, then there exists a sufficiently large N>0 such that
holds for all k>N.
Proof. If rk≥τ>0, then pk=1>lk holds. By Lemma 1, we have
and
If rk<τ, then we accept xk+sk as the new iterate point when
By Step 6 of the ASMTR algorithm, it can be deduced that
Combining with lk∈[e−v,e−1/v], we have
then, for a given ∀ε1>0, there exists N>0 such that
holds for all k>N. By a simple manipulation on (3.1), we get
then by the definition of rk, there is
Set ε1=τ2. According to Lemma 1, we have
We complete the proof. □
Lemma 4. Suppose that Assumptions (i) and (ii) hold. Let the sequence {xk} be generated by the ASMTR algorithm. If xk is not the solution of the problem, i.e., ‖gk‖≠0, then the iteration (2.9) will be terminated at a finite step.
Proof. By contradiction that there is K1>0 such that the iteration (2.9) will cycle infinitely for k≥K1, then ‖gk‖≥ε. Thus, by Step 3 of the ASMTR algorithm, we have pk<lk for all k>K1, so
By Step 5 of the ASMTR algorithm, we have
From Lemmas 1 and 2, we get
Combining (3.3) and (3.4), we obtain
This implies that, for arbitrarily given ε2>0, there exists K2>0 such that rk>1−ε2 holds for ∀k>K2. Since 0<τ<u<1, by letting 0<ε2<1−u, we get
Taking K=max{K1,K2}, we have that (3.2) and (3.5) simultaneously hold for ∀k>K, leading to a contradiction. □
Theorem 1. Let the sequence {xk} be generated by the ASMTR algorithm, then we have
Proof. If the ASMTR algorithm terminates in a finite step, then the conclusion is obviously valid. Consider an infinite number of successful iterations. According to Assumption (i), it is known that the sequence {f(xk)} is bounded, i.e., there is an a∈R such that f(xk)≥a holds for all k.
It is obtained from Lemma 3 that
By (2.10), we have
so
Adding (3.6) with respect to k yields
Noting that fN+K+1>a for any K>0 and taking limit on both sides of (3.7) as K→∞, we have
which deduces
We complete the proof. □
Theorem 2. Suppose that Assumptions (i) and (ii) hold. Also, assume the ASMTR algorithm generates an infinite sequence {xk} converging to the optimal solution x∗, where the matrix ∇2f(x∗) is positive definite and ∇2f(x) is Lipschitz continuous in a neighborhood of x∗. If the following condition holds
then the sequence {xk} converges to x∗ superlinearly.
Proof. Since ∇2f(x∗) is positive definite and ∇2f(x) is continuous in a neighborhood of x∗, there exist positive scalars h and ψ such that
for all
Also, there exists positive integer ¯k such that xk∈˜Ω, for all k≥¯k.
From the Taylor expansion and inequality (3.9), for sufficiently large indices k, we have
for some ζk∈(0,1). So, from (1.7), we get
where c∈(0,1). Considering Lemma 4, p is finite in each iteration.
On the other hand, from the Taylor expansion, we have
for some ςk∈(0,1). Thus,
Dividing both sides by ‖sk‖, we get
So, from Lipschitz continuity of ∇2f(x) on ˜Ω and (3.8), we have
which, from (3.10), yields
implying that the sequence {xk} converges to x∗ superlinearly. □
4.
Numerical experiments
In the current section, we show the numerical performance of the ASMTR algorithm. The test problems are unconstrained problems from CUTEr (a widely used testing environment for optimization software) library [31] and Andrei [32,33]. All codes are written on MATLAB R2015b and run on PC with a 1.19 GHz central processing unit (CPU) processor with 8.00 GB RAM memory. We write two new algorithms as
(1) ASMTR1: the ASMTR algorithm with
(2) ASMTR2: the ASMTR algorithm with
Two new algorithms are compared with the following two algorithms. The first is the simulated annealing-based trust region BB method (SATRBB) [14], whose nonmonotone technique is defined by the modified Metropolis criterion; the second is the nonmonotone trust region BB methods (NTBB) [15]. The parameters are given by: T1=200, v=10, β=0.99, τ=0.1, c1=0.25, c2=0.75, Δ1=1, and ε=10−4 for the SATRBB algorithm, γ=0.1, η1=0.25, η2=0.75, and M=5 for the NTBB algorithm, and u=0.15, c=0.5, κ1=2, κ2=100, and γ1=1 for the ASMTR algorithm.
All test algorithms are terminated when satisfying condition
In addition, the algorithm is stopped if the number of iterations exceeds 500. In such a case, we claim fail of this algorithm. The values x0, x10, x20, x30, x40, and x50 in the second column associate with starting points with x0, 10x0, −10x0, 100x0, −100x0, and −x0, where x0 is the same as [32,33]. The notations used in numerical results include the dimension of the problem (n), the initial point of the problem (x0), the number of iterations (k), the CPU time cost in seconds, the number of function evaluations (nf), and the number of gradient evaluations (ng). The sign "−" means the algorithm fails because the number of iterations exceeds 500. Next, we present some of the numerical results in Examples 4.1–4.4.
Example 4.1. Consider the Broyden banded function
where n is the variable,
ml=5, and mu=1. The initial point of this function is x0=(−1,⋯,−1)T, and the results are listed in Table 1.
Perform numerical experiments on the Broyden banded function for different initial points. Table 1 shows that ASMTR2 needs fewer iterations, function, and gradient evaluations. ASMTR1 and ASMTR2 are superior to SATRBB and NTBB.
Example 4.2. Consider the Penalty function Ⅰ
where n is the variable. The initial point of this function is x0=(1,2,⋯,n)T, and the results are listed in Table 2.
In the case of four dimensions, numerical experiments are performed from the same initial point. We find that ASMTR1 uses less iterations, function, and gradient evaluations than SATRBB and NTBB, and ASMTR2 is better than ASMTR1.
Example 4.3. Consider the Broyden tridiagonal function
where n is the variable. The initial point of function is x0=(−1,⋯,−1)T, and the results are listed in Table 3.
For 10000, 20000, and 50000 dimensions, respectively, five initial points are selected to test the numerical results. SATRBB and NTBB win in approximately 13.4% of performed testing problems concerning the number of iterations, and ASMTR1 and ASMTR2 win in nearly 86.6% of performed testing problems. In addition, ASMTR1 and ASMTR2 need shorter CPU time for most problems. This means the new algorithm is very effective for large-scale optimization problems.
Example 4.4. Consider the nearly separable function
where n is the variable and 1≤j≤n. The initial point of this function is
and the results are listed in Table 4.
Table 4 shows the numerical results of the function under different dimensions of the same initial point. SATRBB and NTBB do not run results within 500 iterations, and ASMTR1 and ASMTR2 effectively solved within 180 iterations.
For more insight, we use the performance profiles introduced by Dolan and Moré [34] to illustrate the numerical performance of the four algorithms based on the testing functions in Table 5. The 20 test functions are listed in Table 5, in which the dimensions vary from 2 to 50000. In Table 5, "No." represents the number of the functions. Here, we also add the nonmonotone adaptive trust region method based on the simple conic model nonmonotone adaptive conic trust region (NACTR) method [7] to compare. This method needs less memory and computational efforts.
Based on the numerical results of all the test problems, we present the performance profiles (including the number of iterations, the CPU time, the number of function evaluations, and the number of gradient evaluations). In a performance profile plot, the horizontal axis gives the percentage (τ) of the test problems for which a method is the fastest (efficiency), while the vertical side gives the percentage (ψ) of the test problems that are successfully solved by each of the methods.
Figures 1–4 plot the performance profiles for the number of iterations, the CPU time, the number of function evaluations, and the number of gradient evaluations, respectively. It can be observed that ASMTR1 and ASMTR2 grow up faster than the other algorithms. In a word, they show that the performance of ASMTR1 and ASMTR2 is superior to SATRBB, NACTR, and NTBB in all aspects. In the overall trend, the performance of ASMTR2 is slightly better than ASMTR1. We believe that ASMTR2 is more competitive. Specifically for large-scale problems, the ASMTR algorithm has a strong numerical stability. From above analysis, we can conclude that the algorithm proposed in our work turns out to be quite competitive.
5.
Conclusions
In this paper, we propose an adaptive simple model trust region algorithm based on new weak secant equations. It is worth noting that the trust region subproblem of the algorithm is solved more simply in contrast to the many other trust region methods proposed in the literature. We discuss the benefits of constructing a simple model using the last three points of information, and the algorithm combines the nonmonotone strategy defined by a modified Metropolis criterion and adaptive strategy. The global convergence and locally superlinearly convergence of the new algorithm are established under appropriate conditions. Numerical experiments show that the proposed algorithm is effective. There are still many deficiencies in our research; For example, the more efficient and widely used adaptive trust region radius is not taken into account. Therefore, we will explore a new adaptive trust region radius in the future and obtain a more effective and robust method.
Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
Acknowledgments
The key project of natural science foundation joint fund of Jilin Province (YDZJ202101ZYTS167, YDZJ202201ZYTS303, YDZJ202201ZYTS320, YDZJ202101ZYTS156); The graduate innovation project of Beihua University (2022001, 2022038).
Conflict of interest
All authors declare no conflicts of interest in this paper.