1.
Introduction
1.1. Background overview
In numerous practical applications, such as neural network optimization [1,2], image segmentation [3], signal processing [4,5,6,7], matrix equations [8,9], and chemical equilibrium analysis [10], nonlinear systems of equations with convex constraints play an essential role. These applications often involve solving problems where the goal is to find solutions satisfying the convex constraints. Therefore, the research into efficient numerical methods for solving nonlinear systems of equations with convex constraints is not only of significant theoretical interest but also holds profound implications for practical applications. In this paper, we focus on the study of nonlinear systems of equations with convex constraints, which can be formulated as follows:
where H⊆Rnis a non-empty closed convex subset, and h:Rn→Rn is a continuous monotone mapping that may not necessarily be smooth. Throughout this paper, we denote ‖⋅‖ by the Euclidean vector norm.
At present, numerical iterative methods for solving nonlinear systems of equations can be broadly categorized into two major types. The first type includes methods that rely on the Jacobian matrix or its approximations, such as Newton's methods [11,12], quasi-Newton methods [13], trust-region methods [14], and Levenberg–Marquardt methods [15,16]. These methods are known for their ability to exhibit rapid local convergence, especially when an appropriate initial point is selected. However, the effectiveness of these methods depends on the computation and storage of the Jacobian matrix or its approximation value. This dependency can lead to algorithm failure or inefficiency, particularly in cases where the Jacobian matrix is difficult to obtain or when nonlinear systems of equations exhibits non-smooth characteristics. The second type includes methods that do not rely on the Jacobian matrix or its approximations, such as conjugate gradient (CG) methods [17,18,19], spectral gradient methods [20], and other derived first-order gradient-based methods [21]. These methods are known for being structurally simple and requiring only first-order derivative information and no matrix storage, which makes them particularly well-suited for solving large-scale nonlinear systems of equations. Due to their simplicity and scalability, these methods have gained popularity among researchers, especially in applications where computational resources are limited or where nonlinear systems of equations are too large for traditional Jacobian-based methods to be practical.
In recent years, the CG method has garnered significant attention from researchers in the field of optimization, becoming a focal point of study. As an efficient iterative method for solving linear systems of equations and unconstrained optimization problems, the CG method has demonstrated notable advantages, particularly in handling large-scale problems. The versatility of the CG method has been further enhanced through its combination with projection techniques, leading to the development of CG projection methods. For example, Liu et al. [22] proposed a new derivative-free spectral Dai–Yuan (DY) type projection method aimed at solving the problem (1.1). Their study, under reasonable assumptions, not only established the global convergence of the proposed method but also demonstrated its linear convergence rate. Besides, Hu et al. [23] developed a modified Hestenes–Stiefel (HS) type CG projection method. This method integrated the steepest descent approach with the traditional CG method and was applied to solve image restoration problems. Ma et al. [24] designed a modified inertial three-term CG projection method for solving the problem (1.1). This method integrated the inertial extrapolation step into the search direction, and was implemented for the sparse signal and image restoration problems.
The CG projection method updates its search direction using the most recent gradient information and the previous search direction. While this method generally performs well, it can encounter challenges such as slow convergence or unstable convergence performance in certain situations. To address these issues, the three-term CG projection method introduces a new approach for updating the search direction. This approach incorporates not only the current gradient information but also the gradient information from the previous two iterations. The key idea is to leverage a richer set of historical gradient information to enhance the search direction, with the aim of accelerating convergence and improving solution accuracy. Without loss of generality, the iterative formula for the three-term CG projection method is given by xk+1=xk+αkdk, where αk is the step size determined by a line search mechanism. The search direction dk is defined as:
where βk is a conjugate parameter, θk is a scalar parameter, hk≜h(xk), and yk−1=hk−hk−1. The parameters βk and θk play a critical role in the performance of the CG projection method, influencing the algorithm's convergence behavior. Traditional CG methods such as Polak–Ribière–Polyak (PRP), Hestenes–Stiefel (HS), and Liu–Storey (LS) are well-known in the field, with the conjugate parameter expressions given by:
Given the outstanding numerical performance of the PRP and HS CG methods, Yin et al. [25] introduced a hybrid three-term CG projection method for solving the problem (1.1), with a particular focus on compressed sensing problems. Additionally, Gao et al. [26] designed an adaptive three-term CG search direction and validated the effectiveness of the proposed method through extensive numerical experiments. Their results demonstrated that the adaptive approach is highly competitive, particularly in solving sparse signal problems.
1.2. Formulation of search direction
To provide a smoother and more coherent introduction to the ideas and algorithms presented in this paper, we begin by reviewing the relevant contributions of the works [27,28]. Existing three-term CG methods, specifically developed to address unconstrained optimization problems, have successfully demonstrated their remarkable efficiency and wide-ranging applicability in this field. From the well-established HS CG method, Li et al. [27] presented an innovative three-term HS-type CG method, which was designed for solving unconstrained optimization problems. Their conjugate and scalar parameters are defined as follows:
where tk=min{ˆt,max{0,yTk−1(yk−1−sk−1)/‖yk−1‖2}}, sk−1=xk−xk−1, and 0<ˆt<1. In addition, Li et al. [28] proposed a three-term PRP-type CG method, also aimed at unconstrained optimization problems. In their approach, the corresponding conjugate and scalar parameters are modified by substituting ‖hk−1‖2 with dTk−1yk−1, yielding the following parameters:
By analyzing the denominators in (1.2), (1.3), and (1.4), we introduce a new notation δk=μ‖dk−1‖‖yk−1‖+max{‖hk−1‖2,dTk−1yk−1,−hTk−1dk−1} with μ>0. Together with the construction forms of (1.3) and (1.4), we design a hybrid modified PRP-HS-LS-type search direction as follows:
where βMPHLk=hTkyk−1/δk−‖yk−1‖2hTkdk−1/δ2k and θMPLk=tkhTkdk−1/δk. This hybrid method is carefully constructed to ensure that the conjugate parameter maintains the desirable properties of the PRP, HS, and LS parameters. Additionally, it is designed to maintain sufficient descent and trust region properties without the need for a line search mechanism.
2.
Description of the proposed algorithm
In this section, we will elaborate on the line search mechanism, the projection operator, and the detailed procedure of the proposed algorithm.
First, the line search mechanism [7,29] is employed to identify a suitable step size that ensures convergence towards the optimal solution. Specifically, the trial point zk=xk+αkdk is evaluated, where the step size αk=βρik. Here, β is a predetermined scaling factor that initializes the search, and ρ is a contraction factor that helps refine the step size iteratively. The integer ik is the smallest nonnegative integer i such that the following condition holds:
where the parameter σ is typically chosen between 0 and 1, and it determines the strength of the sufficient decrease condition.
Second, the projection operator [7,24] is a fundamental tool in the design and theoretical analysis of various algorithms, particularly those addressing convex constrained optimization problems. The projection operator PH[⋅], which maps a point from the space Rn onto its non-empty closed convex subset H, is defined as:
This operator plays a critical role in maintaining feasibility within the constraint set H. Moreover, PH[⋅] possesses the non-expansive property, i.e., ‖PH(x)−PH(y)‖≤‖x−y‖ for all x,y∈Rn.
Finally, based on the concepts discussed above, we propose a hybrid modified PRP-HS-LS-type CG projection algorithm (Abbr. Algorithm MPHL). The step-by-step procedure of the proposed algorithm for solving the problem (1.1) is outlined below.
3.
Analysis of algorithmic properties
3.1. Properties of search direction
The following lemma demonstrates that the search direction dk satisfies two key properties: sufficient descent and trust region characteristics.
Lemma 1. If Algorithm MPHL generates a sequence {dk}, then the search direction dk satisfies the following conditions:
where a1=1−(1+ˆt)2/4 and a2=1+1/μ+1/μ2+ˆt/μ.
Proof. For k=0, (1.5) yields that (3.1) and (3.2) are obviously satisfied. For k≥1, multiplying both sides of (1.5) by hTk, we obtain:
where the inequality holds with the relationship 2aTb≤‖a‖2+‖b‖2. This shows that (3.1) holds.
Next, using the Cauchy-Schwarz inequality, (3.1) can be simplified to obtain ‖dk‖≥a1‖hk‖. From the definition of βMPHLk, we have:
Besides, from the definition of θMPHLk, we have:
Together with the above inequalities, (1.5) yields:
This shows that (3.2) holds. □
3.2. Properties of convergence
To systematically analyze the global convergence of Algorithm MPHL, we first introduce some general assumptions. Throughout the analysis, we assume that the sequences {xk} and {zk} generated by Algorithm MPHL are infinite. Otherwise, the last iteration is the solution of the problem (1.1). To provide a comprehensive understanding of the global convergence, we begin by describing these general assumptions:
(S1) The solution set of the problem (1.1) is non-empty, which is critical for ensuring that the optimization problem is well-posed and that the existence of solutions can be guaranteed under typical scenarios encountered in nonlinear systems.
(S2) The function h(x) is monotone, meaning that (h(x)−h(y))T(x−y)≥0 holds for any x,y∈Rn.
The following lemma plays a pivotal role in establishing the global convergence of Algorithm MPHL, providing the foundation for the subsequent analysis and results.
Lemma 2. Let sequences {xk}, {zk}, and {dk} be generated by Algorithm MPHL. Then the following statements are true:
(i) There exists a step size αk such that the line search mechanism (2.1) is satisfied for all k;
(ii) The sequence {xk} is bounded;
(iii) As the iteration number k approaches infinity, we have limk→∞αk‖dk‖=0, meaning that the product of the step size αk and the norm of the search direction dk tends to zero.
Proof. First, we aim to prove that statement (i) holds. We proceed by using a proof by contradiction. Assume that the line search mechanism (2.1) does not hold. In other words, there exists a constant ˜k>0 such that the following inequality is satisfied for any integer i>0:
By leveraging the continuity of h, we can analyze the behavior of the inequality as i→∞. Specifically, we note that as i increases, the term βρid˜k tends to dominate the argument of the function h, leading to a more pronounced influence of h at points increasingly distant from x˜k. This suggests that the values of h(x˜k+βρid˜k) will converge to a certain limit determined by the continuity of h and the direction of d˜k. Taking the limit as i→∞, we obtain the following expression:
On the other hand, from a previously established result given by (3.1), we know that:
Therefore, we conclude that statement (i) is indeed valid.
Next, we aim to prove that statement (ii) holds. To facilitate the analysis, we begin by two key inequalities. From Assumption (S2), we have the following:
where x∗ denotes an optimal solution such that h(x∗)=0. From the definition of zk and the condition given by (2.1), we have:
Now, by utilizing the projection operator PH[x], the definition of χk, and combining (3.3), and (3.4), we can derive the following inequality:
where the above equality holds for the sum of squares. The inequality (3.5) shows that the norm {‖xk−x∗‖} decreases at each iteration. Thus, we can conclude that the sequence {xk−x∗} is decreasing and converging to 0, implying that {xk} is bounded and converges to the optimal solution x∗. Therefore, we conclude that statement (ii) is indeed valid.
Finally, we proceed to prove that statement (iii) holds. Starting with the inequality in (3.5), we reorganize it to derive the following expression:
This inequality implies that the series ∞∑k=0‖xk−zk‖4 is bounded and convergent. Since ‖xk−zk‖4≥0, we can conclude that limk→∞‖xk−zk‖4=0. This, in turn, implies that limk→∞‖xk−zk‖=0. From the definition of zk, we know that zk=xk+αkdk. Hence, the convergence of ‖xk−zk‖ to 0 implies limk→∞αk‖dk‖=0. □
The global convergence properties of Algorithm MPHL are outlined in the following theorem.
Theorem 1. Let the sequence {hk} be generated by Algorithm MPHL. Then, the sequence satisfies the following convergence property:
Proof. We begin by proceeding with a proof by contradiction. For the sake of contradiction, assume that there exists a positive constant ϑ such that ‖hk‖>ϑ holds for all k≥0. This assumption, combined with the result from (3.2), implies that ‖dk‖≥a1ϑ. Moreover, due to the continuity of the function h(x) and the boundedness of the sequence {xk}, it follows that the sequence {hk} is also bounded. This means that there exists another positive constant ι such that ‖hk‖≤ι. Utilizing this, together with (3.2), we can deduce that ‖dk‖≤a2ι. Consequently, the sequence {dk} is bounded, which implies that limk→∞αk=0 with Lemma 2 (iii).
Since both sequences {xk} and {dk} are bounded, there exists an infinite index set Γ such that:
Next, we consider the line search mechanism (2.1), which gives the inequality:
Taking the limit as j \to \infty , we arrive at -h(\acute{x})^\text{T} \acute{d} \leq 0 . Furthermore, from the inequality derived in (3.1), we have:
Taking the limit as j \to \infty , we conclude that -h(\acute{x})^\text{T} \acute{d} \geq a_1 \|h(\acute{x})\| > 0 . This directly contradicts the earlier result that -h(\acute{x})^\text{T} \acute{d} \leq 0 . Therefore, the assumption that \|h_k\| > \vartheta for all k \geq 0 must be false, and we conclude that \lim\limits_{k \to \infty} \|h_k\| = 0 . □
4.
Numerical experiment for constrained nonlinear systems of equations
4.1. General setups
This section focuses on applying the Algorithm MPHL to a set of widely recognized test problems and comparing its performance with three existing algorithms: Algorithm PSGM [22], Algorithm PDY [29], and Algorithm PCG [30]. The experiments were conducted on a system running Ubuntu 20.04.2 LTS 64-bit, powered by an Intel(R) Xeon(R) Gold 5115 2.40GHz CPU. This standardized computational environment ensures the reliability and consistency of the results, enabling a fair comparison between the algorithms.
The parameter settings for Algorithm PSGM, Algorithm PDY, and Algorithm PCG are configured according to their respective references. For Algorithm MPHL, the parameters are set as follows: \beta = 1 , \rho = 0.74 , \sigma = 10^{-4} , \gamma = 1.3 , \hat{t} = 10^3 , \mu = 2 , \epsilon = 10^{-6} . The test problems are formulated in the form h(x) = (h_1(x), h_2(x), \ldots, h_n(x))^\text{T} with the variable x = (x_1, x_2, \ldots, x_n) . For each test problem, the dimension n is set to [10, 000 50, 000 100, 000 150, 000 200, 000]. The initial points are selected as follows: x_1 = (1, 1, \ldots, 1)^\text{T} , x_2 = (0.1, 0.1, \ldots, 0.1)^\text{T} , x_3 = (\frac{1}{2}, \frac{1}{2^2}, \ldots, \frac{1}{2^{n}})^\text{T} , x_4 = (2, 2, \ldots, 2)^\text{T} , x_5 = (1, \frac{1}{2}, \ldots, \frac{1}{n})^\text{T} , x_6 = (\frac{1}{n}, \frac{2}{n}, \ldots, \frac{n}{n})^\text{T} , and x_7 = (\frac{n-1}{n}, \frac{n-2}{n}, \ldots, \frac{n-n}{n})^\text{T} . Specifically, each algorithm is terminated when \|h_k\| \leq \epsilon or the number of iterations exceeded 2000. The following test problems are considered:
Problem 1. Set
and the constraint set H = \mathbb{R}^n_+ .
Problem 2. Set
and the constraint set H = \{x\in\mathbb{R}^n : \sum\limits_{i = 1}^n x_i \leq n, \; x_i > -1, \; i = 1, 2, \ldots, n \} .
Problem 3. Set
and the constraint set H = \mathbb{R}^n_+ .
Problem 4. Set
and the constraint set H = \mathbb{R}^n_+ .
Problem 5. Set
and the constraint set H = \mathbb{R}^n_+ .
Problem 6. Set
and the constraint set H = \mathbb{R}^n_+ .
Problem 7. Set
and the constraint set H = \mathbb{R}^n_+ .
4.2. Numerical reports
The numerical results of the test problems, solved by these four algorithms, are presented in Tables 1–7. In these tables, "Init( n )" denotes the initial points and the dimension multiplied by 1000. "CPUT" denotes the CPU time in seconds, "NF" denotes the number of function evaluations, and "NIter" denotes the number of iterations. These tables demonstrate that all four algorithms are capable of solving the test problems under different initial points and dimensions. The results clearly show that each of the four algorithms successfully solved the test problems. To visually assess the performance of the four algorithms, we employed the performance profiles technique [31] to plot performance curves based on CPUT, NF, and NIter. In these plots, the higher the curve, the better the corresponding algorithm performs. As shown in Figures 1–3, Algorithm MPHL outperformed the others in a significant portion of the test cases, winning approximately 52%, 51%, and 48% of the test problems in terms of CPUT, NF, and NIter, respectively. The performance curves of Algorithm MPHL generally lie above those of Algorithm PSGM, Algorithm PDY, and Algorithm PCG, indicating its superior efficiency and effectiveness in solving these test problems.
5.
Numerical experiment for sparse signal restoration
5.1. General setups
A key challenge in compressed sensing is recovering sparse signals from under-determined systems of linear equations [32]. The objective is to identify sparse solutions that meet a specific set of linear constraints. We consider the under-determined systems of linear equations as follows:
where \theta is an observed signal, \Theta\in R^{m\times n} is a linear mapping, and \tilde{x} is an original sparse signal. It is inherently difficult to restore the sparse signal \tilde{x} from the observed signal \theta due to the ill-posed nature of the linear system, which often lacks a unique solution. To tackle this challenge, regularization techniques are often employed, which impose additional constraints to promote sparsity in the solution. Specifically, the problem can be formulated as the minimization of a combined \ell_1 - \ell_2 norm:
where \varpi > 0 is the regularization parameter, and \|x\|_1 and \|x\|_2 represent the \ell_1 and \ell_2 norms, respectively. Clearly, the problem (5.1) is a convex, unconstrained optimization problem. This convexity is crucial as it ensures the existence of a global minimum, which can be efficiently identified using various optimization algorithms. The regularization parameter \varpi plays a key role in balancing the trade-off between sparsity and the signal.
The process of solving the problem (5.1) begins with reformulating the problem into a convex quadratic form [33]. This involves expressing any x \in R^n as a vector composed of two non-negative components, such that
where u_i = (x_i)_{+} and v_i = (-x_i)_{+} for all i = 1, 2, \dots, n , with (\cdot)_{+} = \max\{0, x\} . Together with the problem (5.1), this results in the following optimization problem:
where e_n = (1, 1, \dots, 1)^\text{T} \in R^n . For convenient writing, we define the following notations:
According to these notations, the problem (5.2) can be reformulated as follows:
Additionally, according to the work [34], the problem (5.3) was further reformulated, and it was demonstrated to be equivalent to the following:
As a result, our proposed algorithm can be applied in the sparse signal restoration.
To evaluate the performance of Algorithm MPHL, its numerical results are compared with those of Algorithm PSGM, Algorithm PDY, and Algorithm PCG. The context of this evaluation is sparse signal restoration, which involves recovering an original signal of length n from an observed signal of length m . Note that the length of the observed signal is less than that of the original signal. The performance of the algorithms is characterized using the following metrics: (i) Mean Squared Error (MSE); (ii) NIter; (iii) CPUT; (iv) Objective function value (objFun).
5.2. Numerical reports
In this experiment, we consider two signal sizes: (n, m, k) = (2^{12}, 2^{10}, 2^{7}) and (n, m, k) = (2^{13}, 2^{11}, 2^{8}) . Figures 4 and 6 sequentially display the original signal, the observed signal, and the recovered signals obtained by each algorithm. The results indicate that all algorithms successfully recover the original signal. However, Algorithm MPHL achieves signal recovery with fewer iterations and shorter CPU time compared to the other algorithms. Further analysis reveals that as the signal size increases, Algorithm MPHL shows only a modest increase in the number of iterations and CPU time. Additionally, Figures 5 and 7 provide a comparative analysis of the convergence results for the four algorithms in terms of MSE, NIter, CPUT, and objFun. These figures illustrate that the MSE values and objective function values for all four algorithms ultimately approach zero, indicating that each algorithm can achieve high-quality restoration results. Notably, Algorithm MPHL demonstrates faster convergence, outperforming the other algorithms in terms of both efficiency and computational cost.
6.
Conclusions
In this paper, we proposed a novel hybrid PRP-HS-LS-type conjugate gradient algorithm tailored for solving nonlinear systems of equations with convex constraints. The proposed algorithm incorporates a hybrid technique to design the conjugate parameter, combining the strengths of PRP, HS, and LS conjugate gradient methods to enhance performance and stability. The search direction, formulated by using the hybrid conjugate parameter, inherently satisfies the sufficient descent condition and trust region properties. Remarkably, this is achieved without the need for a line search mechanism. The global convergence of the proposed algorithm is rigorously established under general conditions. Notably, this proof does not rely on the often restrictive Lipschitz continuity assumption, broadening the applicability of the algorithm to a wider class of problems. Extensive numerical experiments demonstrate the algorithm's superior efficiency, particularly in solving large-scale nonlinear systems of equations with convex constraints. Additionally, the proposed algorithm proves highly effective in addressing sparse signal restoration problems. In the future, we will discuss the potential extension of our algorithm to complex variables.
Author contributions
Xuejie Ma: Conceptualization, Investigation, Writing–original draft, Writing–review and editing, Funding acquisition; Songhua Wang: Conceptualization, Funding acquisition, Writing–review and editing. All authors have read and approved the final version of the manuscript for publication.
Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
Acknowledgments
This work is supported by the Natural Science Foundation in Guangxi Province, PR China (grant number 2024GXNSFAA010478) and the Guangzhou Huashang College Daoshi Project (grant number 2024HSDS15).
Conflict of interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.