The worst-case scenario: robust portfolio optimization with discrete distributions and transaction costs

Ebenezer Fiifi Emire Atta Mills; Ebenezer Fiifi Emire Atta Mills

doi:10.3934/math.20241018

AIMS Mathematics

2024, Volume 9, Issue 8: 20919-20938. doi: 10.3934/math.20241018

Previous Article Next Article

Research article

The worst-case scenario: robust portfolio optimization with discrete distributions and transaction costs

Ebenezer Fiifi Emire Atta Mills ^{1,2
,
,}

1.
School of Mathematical Sciences, Wenzhou-Kean University, Wenzhou, China
2.
Academy of Research for Sustainability (AIRs), Wenzhou-Kean University, Wenzhou, China

Received: 14 March 2024 Revised: 27 May 2024 Accepted: 30 May 2024 Published: 28 June 2024
MSC : 91B05, 91G10

This research introduces min-max portfolio optimization models that incorporating transaction costs and focus on robust Entropic value-at-risk. This study offers a unified approach to handl the distribution of random parameters that affect the reward and risk aspects. Utilizing the duality theorem, the study transforms the optimization models into manageable forms, thereby accommodating the underlying random variables' discrete box and ellipsoidal distributions. The impact of transaction costs on optimal portfolio selection is examined through numerical examples under a robust return-risk framework. The results underscore the importance of the proposed model in safeguarding capital and reducing exposure to extreme risks, thus outperforming other strategies documented in the literature. This demonstrates the model's effectiveness in balancing maximizing returns and minimizing potential losses, making it a valuable tool for investors that seek to navigate uncertain financial markets.

Keywords:

Citation: Ebenezer Fiifi Emire Atta Mills. The worst-case scenario: robust portfolio optimization with discrete distributions and transaction costs[J]. AIMS Mathematics, 2024, 9(8): 20919-20938. doi: 10.3934/math.20241018

Related Papers:

[1]	Xiaowei Fang . A derivative-free RMIL conjugate gradient method for constrained nonlinear systems of monotone equations. AIMS Mathematics, 2025, 10(5): 11656-11675. doi: 10.3934/math.2025528
[2]	Xuejie Ma, Songhua Wang . A hybrid approach to conjugate gradient algorithms for nonlinear systems of equations with applications in signal restoration. AIMS Mathematics, 2024, 9(12): 36167-36190. doi: 10.3934/math.20241717
[3]	Habibu Abdullahi, A. K. Awasthi, Mohammed Yusuf Waziri, Issam A. R. Moghrabi, Abubakar Sani Halilu, Kabiru Ahmed, Sulaiman M. Ibrahim, Yau Balarabe Musa, Elissa M. Nadia . An improved convex constrained conjugate gradient descent method for nonlinear monotone equations with signal recovery applications. AIMS Mathematics, 2025, 10(4): 7941-7969. doi: 10.3934/math.2025365
[4]	Xiyuan Zhang, Yueting Yang . A new hybrid conjugate gradient method close to the memoryless BFGS quasi-Newton method and its application in image restoration and machine learning. AIMS Mathematics, 2024, 9(10): 27535-27556. doi: 10.3934/math.20241337
[5]	Yixin Li, Chunguang Li, Wei Yang, Wensheng Zhang . A new conjugate gradient method with a restart direction and its application in image restoration. AIMS Mathematics, 2023, 8(12): 28791-28807. doi: 10.3934/math.20231475
[6]	Abdulkarim Hassan Ibrahim, Poom Kumam, Auwal Bala Abubakar, Umar Batsari Yusuf, Seifu Endris Yimer, Kazeem Olalekan Aremu . An efficient gradient-free projection algorithm for constrained nonlinear equations and image restoration. AIMS Mathematics, 2021, 6(1): 235-260. doi: 10.3934/math.2021016
[7]	Rabiu Bashir Yunus, Ahmed R. El-Saeed, Nooraini Zainuddin, Hanita Daud . A structured RMIL conjugate gradient-based strategy for nonlinear least squares with applications in image restoration problems. AIMS Mathematics, 2025, 10(6): 14893-14916. doi: 10.3934/math.2025668
[8]	Sani Aji, Poom Kumam, Aliyu Muhammed Awwal, Mahmoud Muhammad Yahaya, Kanokwan Sitthithakerngkiet . An efficient DY-type spectral conjugate gradient method for system of nonlinear monotone equations with application in signal recovery. AIMS Mathematics, 2021, 6(8): 8078-8106. doi: 10.3934/math.2021469
[9]	Jamilu Sabi'u, Ibrahim Mohammed Sulaiman, P. Kaelo, Maulana Malik, Saadi Ahmad Kamaruddin . An optimal choice Dai-Liao conjugate gradient algorithm for unconstrained optimization and portfolio selection. AIMS Mathematics, 2024, 9(1): 642-664. doi: 10.3934/math.2024034
[10]	Maulana Malik, Ibrahim Mohammed Sulaiman, Auwal Bala Abubakar, Gianinna Ardaneswari, Sukono . A new family of hybrid three-term conjugate gradient method for unconstrained optimization with application to image restoration and portfolio selection. AIMS Mathematics, 2023, 8(1): 1-28. doi: 10.3934/math.2023001

Abstract

1. Introduction

Systems of nonlinear equations are fundamental to a diverse range of applications, including power flow analysis ^[1], economic equilibrium modeling ^[2], the development of generalized Bregman distance proximal point methods ^[3], and traffic assignment ^[4]. Meanwhile, these systems also involve monotone variational inequalities ^[5,6] and compression sensing problems ^[7,8]. Given the ubiquity and significance of such problems across these varied domains, the study and development of numerical methods to efficiently solve systems of nonlinear equations are of considerable practical importance. In this paper, we focus on a specific class of systems of nonlinear equations subject to convex constraints, which can be formulated as follows:

$\begin{equation} \theta(a) = 0, \quad a \in \Theta, \end{equation}$

(1.1)

where $\Theta \subseteq \mathbb{R}^n$ is a non-empty, and closed convex set. The function $\theta: \mathbb{R}^n \rightarrow \mathbb{R}^n$ is assumed to possess monotonicity and continuous differentiability, satisfying the following condition:

$\langle \theta(a) - \theta(b), a - b \rangle \geq 0, \quad \forall a, b \in \mathbb{R}^n.$

Generally, the gradient-type method generates a sequence $\{a_k\}$ , defined as follows:

$\begin{equation*} a_{k+1} = a_k + t_k d_k, \quad k \geq 0, \end{equation*}$

where $t_k$ is the step length, and $d_k$ denotes the search direction. The choice of the search direction $d_k$ gives rise to various gradient-type methods, such as the steepest descent methods, Newton's methods, and quasi-Newton methods ^[9,10]. Newton's and quasi-Newton methods, along with their numerous variants, have been extensively studied due to their strong local linear convergence properties. For instance, Mahdavi et al. ^[11] proposed and analyzed a nonmonotone quasi-Newton algorithm for strongly convex multiobjective optimization, demonstrating its global convergence and local superlinear convergence rate under certain conditions. Sihwail et al. ^[12] proposed a novel hybrid method, Newton-Harris hawks optimization, which combines Newton's methods and Harris hawks optimization to effectively solve systems of nonlinear equations. Moreover, Krutikov et al. ^[13] demonstrated that quasi-Newton methods, when applied to strongly convex functions with a Lipschitz gradient, achieve geometric convergence without relying on local quadratic approximations. However, despite these advantages, Newton's and quasi-Newton methods involve the computation of the Hessian matrix or its approximation value at each iteration, which significantly increases computational complexity. This requirement can be a limiting factor, particularly for large-scale problems where the Hessian matrix is difficult to compute and store efficiently.

The conjugate gradient method ^[14,15] is one of the most effective approaches in the field of the gradient-type methods. It is highly recognized for its efficiency, simplicity, lower storage requirements, and reliable convergence properties. These characteristics make it particularly well-suited for solving large-scale systems of nonlinear equations ^[16,17]. The method's search direction is typically defined as follows:

$d_k = \begin{cases} -\theta_k, & k = 0, \\ -\theta_k + \beta_k d_{k-1}, & k \geq 1, \end{cases}$

where $\theta_k \triangleq \theta(a_k)$ and $\beta_k$ is known as the conjugate parameter. The choice of $\beta_k$ differentiates various conjugate gradient methods. Several advancements have been made in the development of conjugate gradient methods with different conjugate parameters ^[18]. For instance, Ma et al. ^[19] proposed a modified inertial three-term conjugate gradient method for solving nonlinear monotone equations with convex constraints. This method is notable for its global and Q-linear convergence properties, and has demonstrated superior numerical performance in applications such as sparse signal recovery and image restoration in compressed sensing. Furthermore, Liu et al. ^[20] introduced a spectral conjugate gradient method with an inertial factor for solving nonlinear pseudo-monotone equations over a convex set. Additionally, Sabiu et al. ^[21] developed an optimal scaled Perry conjugate gradient method for solving large-scale systems of monotone nonlinear equations. This method ensures global convergence under the conditions of monotonicity and Lipschitz continuity.

Inspired by the classical Liu-Story (LS) ^[22] and Rivai-Mohamad-Ismail-Leong (RMIL) ^[23] conjugate parameters, as well as incorporating the hybrid technique (e.g, ^[24,25]) and the projection approach, we develop a modified LS-RMIL-type conjugate gradient projection algorithm. The proposed algorithm is specifically designed for solving systems of nonlinear equations with convex constraints. In this paper, $\|\cdot\|$ and $\langle \cdot, \cdot \rangle$ represent the Euclidean norm and the inner product of vectors, respectively.

2. Algorithm design

2.1. Search direction and its properties

In this section, we refine and enhance the search direction employed in the optimization processes of the LS and RMIL methods. Specifically, Liu et al. ^[22] and Rivai et al. ^[23] introduced conjugate parameters, defined respectively as:

$\begin{equation*} \beta_k^{LS} = \frac{\langle \theta_k, y_{k-1} \rangle}{- \langle \theta_{k-1}, d_{k-1} \rangle}, \quad \beta_k^{RMIL} = \frac{\langle \theta_k, y_{k-1} \rangle}{\|d_{k-1}\|^2}. \end{equation*}$

Based on the insights derived from these parameters, we adopt the hybrid technique (e.g., ^[24,25]) that combines their key features. This leads to the formulation of a new conjugate parameter, which is subsequently incorporated into the framework of a three-term search direction. Our primary objective is to construct a novel search direction that ensures both the sufficient descent and trust-region properties, which are critical for the robustness and efficiency of the optimization process. To accomplish this, we carefully design a novel search direction tailored to meet these requirements. The designed search direction $d_k$ is defined as follows:

$\begin{equation} d_k = \left\{ \begin{array}{ll} -\theta_k, & k = 0, \\ -\theta_k + \beta_k d_{k-1} + \varpi_k y_{k-1}, & k \geq 1, \end{array} \right. \end{equation}$

(2.1)

where the conjugate parameter $\beta_k$ and the scalar parameter $\varpi_k$ are given by

$\begin{equation} \beta_k = \frac{\langle \theta_k, y_{k-1} \rangle}{c_k} - \frac{\|y_{k-1}\|^2 \langle \theta_{k}, d_{k-1} \rangle}{c_k^2} \quad \text{and} \quad \varpi_k = \frac{\nu_k \langle \theta_k, d_{k-1} \rangle}{c_k}, \end{equation}$

(2.2)

with $y_{k-1} = \theta_k - \theta_{k-1}$ . One scalar parameter $c_k$ is crucial for maintaining stability in the iterative process, and is defined by

$c_k = \max \left\{ \mu \|d_{k-1}\| \|y_{k-1}\|, - \langle \theta_{k-1}, d_{k-1} \rangle, \| d_{k-1} \|^2 \right\},$

where $\mu$ is a positive constant. Another scalar parameter $\nu_k$ is introduced to fine-tune the adjustment of the search direction. It is defined as $\nu_k = \min\{\tilde{\nu}, \max\{\bar{\nu}_k, 0\}\}$ with $0 < \tilde{\nu} < 1$ , and $\bar{\nu}_k = \frac{\langle \theta_k, y_{k-1} - s_{k-1} \rangle}{\|\theta_k\|^2},$ where $s_{k-1} = a_k - a_{k-1}$ represents the difference between the iterative points $a_k$ and $a_{k-1}$ of the optimization variable.

Before delving into the sufficient descent and trust-region properties of the designed search direction (2.1), we can deduce some important bounds from the definition of $\beta_k$ and $\varpi_k$ . We consider the bound for $\beta_k$ :

$|\beta_k| \leq \frac{\|\theta_k\| \|y_{k-1}\|}{c_k} + \frac{\|y_{k-1}\|^2 \|\theta_k\| \|d_{k-1}\|}{c_k^2},$

which can be further bounded by

$|\beta_k| \leq \frac{\|\theta_k\| \|y_{k-1}\|}{\mu \|d_{k-1}\| \|y_{k-1}\|} + \frac{\|y_{k-1}\|^2 \|\theta_k\| \|d_{k-1}\|}{\mu^2 \|d_{k-1}\|^2 \|y_{k-1}\|^2}.$

Simplifying this, we obtain

$\begin{equation} |\beta_k| \leq \left( \frac{1}{\mu}+\frac{1}{\mu^2} \right) \frac{\|\theta_k\|}{\|d_{k-1}\|}. \end{equation}$

(2.3)

Next, we consider the bound for $\varpi_k$ :

$\begin{equation} |\varpi_k| \leq \frac{\nu_k \|\theta_k\| \|d_{k-1}\|}{c_k} \leq \frac{\tilde{\nu}_k \|\theta_k\| \|d_{k-1}\|}{\mu \|d_{k-1}\| \|y_{k-1}\|} = \frac{\tilde{\nu}}{\mu} \frac{\|\theta_k\|}{\|y_{k-1}\|}. \end{equation}$

(2.4)

Lemma 1. The search direction $d_k$ generated by (2.1) satisfies the sufficient descent property:

$\langle \theta_k, d_k \rangle \leq -M \|\theta_k\|^2,$

where $M = 1 - \frac{1}{4} (1 + \tilde{\nu})^2$ .

Proof. For $k = 0$ , the conclusion is straightforward, which implies $\langle \theta_0, d_0 \rangle = - \|\theta_0\|^2 \leq -M \|\theta_0\|^2$ . For $k \geq 1$ , together with the search direction generated by (2.1), we have:

$\begin{equation} \begin{array}{lll} \langle \theta_k, d_k \rangle & = & \langle \theta_k, -\theta_k + \beta_k d_{k-1} + \varpi_k y_{k-1} \rangle \\ & = & -\|\theta_k\|^2 + \frac{\langle \theta_k, y_{k-1} \rangle \langle \theta_{k}, d_{k-1} \rangle}{c_k} - \frac{\|y_{k-1}\|^2 \langle \theta_{k}, d_{k-1} \rangle^2}{c_k^2} + \frac{\nu_k \langle \theta_k, y_{k-1} \rangle \langle \theta_{k}, d_{k-1} \rangle}{c_k} \\ & = & -\|\theta_k\|^2 + (1+\nu_k) \frac{\langle \theta_k, y_{k-1} \rangle \langle \theta_{k}, d_{k-1} \rangle}{c_k} - \frac{\|y_{k-1}\|^2 \langle \theta_{k}, d_{k-1} \rangle^2}{c_k^2} . \end{array} \end{equation}$

(2.5)

By applying the inequality $\langle e_k, g_k \rangle \leq \frac{1}{2} (\|e_k\|^2+\|g_k\|^2)$ , where $e_k = (1 + \nu_k) \theta_k / \sqrt{2}$ and $g_k = \sqrt{2} \langle \theta_k, d_{k-1} \rangle y_{k-1} / c_k$ , we obtain the following result:

$\begin{equation} \frac{(1+\nu_k)\langle \theta_k, y_{k-1} \rangle \langle \theta_{k}, d_{k-1} \rangle}{c_k} \leq \frac{1}{4} (1+\nu_k)^2 \|\theta_k\|^2 + \frac{\langle \theta_k, d_{k-1} \rangle^2 \|y_{k-1}\|^2 }{c_k^2}. \end{equation}$

(2.6)

Substituting (2.6) into (2.5), we obtain

$\langle \theta_k, d_k \rangle \leq -\|\theta_k\|^2 + \frac{1}{4} (1 + \tilde{\nu}_k)^2 \|\theta_k\|^2 \leq -\left(1-\frac{1}{4}(1+\tilde{\nu})^2\right) \|\theta_k\|^2.$

Thus, the result holds. □

Lemma 2. The search direction $d_k$ generated by (2.1) satisfies the trust-region property:

$M \|\theta_k\| \leq \|d_k\| \leq N \|\theta_k\|,$

where $N = 1 + \frac{1}{\mu} + \frac{1}{\mu^2} + \frac{\tilde{\nu}}{\mu}$ .

$\|d_k\| \geq M \|\theta_k\|.$

Additionally, together with (2.1), we obtain:

$\|d_k\| = \|- \theta_k + \beta_k d_{k-1} + \varpi_k y_{k-1}\| \leq \|\theta_k\| + |\beta_k| \|d_{k-1}\| + |\varpi_k| \|y_{k-1}\|.$

Substituting the inequalities $|\beta_k|$ and $|\varpi_k|$ defined in (2.3) and (2.4), respectively, into the above equality, we have

$\|d_k\| \leq \left( 1 + \frac{1}{\mu} + \frac{1}{\mu^2} + \frac{\tilde{\nu}}{\mu} \right) \|\theta_k\|.$

Thus, the result holds. □

2.2. Algorithm description

Before delving into the specifics of our proposed algorithm, it is essential to first clarify the line search approach, the projection operator, and the iterative update rule employed in the proposed algorithm. These foundational components play a crucial role in the overall efficacy of the proposed algorithm.

First, in the proposed algorithm, the line search approach is used to determine an appropriate step length $t_k = \eta \rho^{i_k}$ . Specifically, this step length is computed based on the following procedure, where $i_k = \{i: i = 0, 1, \ldots\}$ is the smallest non-negative integer $i$ that satisfies the following inequality:

$\begin{equation} -\langle \theta(a_k + \eta \rho^i d_k), d_k \rangle \geq \sigma \eta \rho^i \| \theta(a_k + \eta \rho^i d_k) \| \|d_k\|^2, \end{equation}$

(2.7)

where $\eta > 0$ , $\rho \in (0, 1)$ , and $\sigma > 0$ are algorithmic parameters.

Furthermore, the projection operator $P_{\Theta}[\cdot]$ is a critical component that ensures the iterative points remain within the feasible region $\Theta$ . Specifically, the projection of a point $a \in \mathbb{R}^n$ onto the set $\Theta$ is defined as

$P_{\Theta}[a] = \arg\min \{\|a - b\| : b \in \Theta\}, \quad a \in \mathbb{R}^n.$

This operator identifies the point in $\Theta$ closest to $a$ in the Euclidean norm. Moreover, the projection operator is non-expansive, meaning it satisfies the property:

$\begin{equation} \| P_{\Theta}[a]-P_{\Theta}[b] \| \leq \| a-b \|. \end{equation}$

(2.8)

Finally, the iterative update rule forms the core of the proposed algorithm, indicating how the next iterative point $a_{k+1}$ is computed from the current iterative point $a_k$ . Specifically, the update is performed by using the following formula:

$\begin{equation} a_{k+1} = P_{\Theta}\left[ a_k - \gamma w_k \theta(z_k) \right], \quad w_k = \frac{\langle \theta(z_k), a_k-z_k\rangle}{||\theta(z_k)||^2}, \end{equation}$

(2.9)

where $z_k = a_k + t_k d_k$ and $\gamma \in (0, 2)$ . This projection-based update ensures that the new iterative point remains feasible and moves towards reducing the objective function.

With the foundational components described above, we now present the detailed steps of an improved LS-RMIL-type conjugate gradient projection algorithm (Abbr. ILR algorithm), which is described as Algorithm 1.

Algorithm 1 An improved LS-RMIL-type conjugate gradient projection algorithm

1: Initialization:

$a_0\in \mathbb{R}^n$ ,

$\mu > 0$ ,

$\tilde{\nu}\in(0, 1)$ ,

$\eta, \sigma > 0$ ,

$\rho\in(0, 1)$ ,

$\tau > 0$ , and set

$k: = 0$ .
2: while

$\|\theta_k\| > \tau$ do
3: Evaluate two parameters

$\beta_k$ and

$\varpi_k$ from (2.2) and search direction

$d_k$ from (2.1).
4: Evaluate the step length

$t_k$ from (2.7) and set the trial point

$z_k = a_k+t_kd_k$ .
5: if

$z_k\in\Theta$ and

$\|\theta(z_k)\| < \tau$ then
  6:                break.
  7:        else
  8:                Evaluate the next iterative point

$a_{k+1}$ from (2.9).
9: end if
10: Set

$k: = k+1$ .
11: end while

3. Properties analysis

In this section, we provide a comprehensive analysis of the global convergence properties of the ILR algorithm. To facilitate this analysis, we introduce the following key assumptions.

Assumption B:

(B1) The solution set $\Theta_*$ of problem (1.1) is non-empty.

(B2) The function $\theta(a)$ exhibits a monotonicity property, i.e.,

$\langle \theta(a) - \theta(b), a - b \rangle \geq 0, \quad \forall a, b \in \mathbb{R}^n.$

These assumptions are fundamental in establishing the convergence behavior of the ILR algorithm as they ensure that the iterative process converges to a solution within the feasible region of problem (1.1).

The following lemma demonstrates that the line search approach defined in (2.7) of the ILR algorithm is indeed well-defined and can be successfully applied in the iterative process.

Lemma 3. Consider the sequence $\{t_k\}$ generated by the ILR algorithm. Then, there exists a step length $t_k$ at each iteration that satisfies the line search approach defined in (2.7).

Proof. We proceed by contradiction and assume that inequality (2.7) does not hold. Specifically, suppose there exists a positive index $k_0$ such that, for all $i \in \{0\} \cup \mathbb{N}$ , the following inequality is satisfied:

$\begin{equation*} - \langle \theta(a_{k_0} + \eta \rho^i d_{k_0}), d_{k_0} \rangle < \sigma \eta \rho^i \| \theta(a_{k_0} + \eta \rho^i d_{k_0}) \| \|d_{k_0}\|^2. \end{equation*}$

By utilizing the continuity of $\theta$ and taking the limit as $i \to \infty$ , the above inequality yields:

$\begin{equation} - \langle \theta(a_{k_0}), d_{k_0} \rangle \leq 0. \end{equation}$

(3.1)

On the other hand, invoking Lemma 1 and again taking the limit as $i \to \infty$ , we obtain:

$\begin{equation*} -\langle \theta(a_{k_0}), d_{k_0} \rangle \geq M \| \theta(a_{k_0}) \|^2 > 0, \end{equation*}$

which clearly contradicts inequality (3.1). This contradiction implies that the initial assumption must be false, and therefore inequality (2.7) must hold. □

The following lemma establishes that the sequence $\{a_k\}$ generated by the ILR algorithm exhibits monotonic behavior with respect to the solutions set $\Theta_*$ of problem (1.1).

Lemma 4. Consider the sequences $\{a_k\}$ and $\{z_k\}$ generated by the ILR algorithm. Then, the following properties hold:

(i) The sequence $\{a_k\}$ is bounded, meaning that there exists a constant $D > 0$ such that $\|a_k\|\leq D$ for all $k \geq 0$ .

(ii) The sequence $\{z_k\}$ converges to the sequence $\{a_k\}$ , i.e., $\lim\limits_{k \to \infty} \|z_k-a_k\| = 0$ .

Proof. From the definition of the projection operator $P_{\Theta}[\cdot]$ and the non-expensive property defined in (2.8), we can derive the following inequality:

$\begin{equation} \begin{array}{lll} \|a_{k+1} - a_\ast \|^2 & = & \| P_\Theta\left[ a_k - \gamma w_k \theta(z_k) \right] - P_\Theta[a_\ast]\|^2 \\ & \leq & \|a_k - \gamma w_k \theta(z_k) - a_\ast \|^2 \\ & = & \|a_k - a_\ast \|^2 - 2 \gamma w_k \langle \theta(z_k), a_k - a_\ast \rangle + \gamma^2 w_k^2 \|\theta(z_k)\|^2, \end{array} \end{equation}$

(3.2)

where $a_*$ denotes a solution of problem (1.1). Next, starting from Assumption B2, the definition of $z_k$ , and the line search approach (2.7), we can further establish the following inequality:

$\begin{equation} \begin{array}{lll} \langle \theta(z_k), a_k - a_\ast \rangle & = & \langle \theta(z_k), a_k - z_k \rangle + \langle \theta(z_k), z_k - a_\ast \rangle - \langle \theta(a_*), z_k-a_* \rangle \\ & \geq & \langle \theta(z_k), a_k - z_k \rangle \\ & \geq & \sigma t_k^2 \|\theta(z_k)\| \|d_k\|^2. \end{array} \end{equation}$

(3.3)

Combining with (3.2), (3.3), and the definition of $w_k$ , we can derive

$\begin{equation} \begin{array}{lll} \|a_{k+1} - a_\ast \|^2 & \leq & \|a_k - a_\ast \|^2 - 2 \gamma w_k \langle \theta(z_k), a_k - z_k \rangle + \gamma^2 w_k^2 \|\theta(z_k)\|^2, \\ & = & \|a_k - a_\ast \|^2 - 2 \gamma w_k^2 \|\theta(z_k)\|^2 + \gamma^2 w_k^2 \|\theta(z_k)\|^2, \\ & = & \|a_k - a_\ast \|^2 - (2\gamma-\gamma^2) w_k^2 \|\theta(z_k)\|^2, \end{array} \end{equation}$

(3.4)

Given the definition of $w_k$ and (3.3), we have

$\|\theta(z_k)\|^2 w_k = \langle \theta(z_k), a_k - z_k \rangle \geq \sigma t_k^2 \|\theta(z_k)\| \|d_k\|^2,$

which implies that $\|\theta(z_k)\| w_k \geq \sigma t_k^2 \|d_k\|^2$ . Substituting this into (3.4), we obtain

$\begin{equation} \begin{array}{lll} \|a_{k+1} - a_\ast \|^2 & \leq & \|a_k - a_\ast \|^2 - (2\gamma-\gamma^2) (\sigma t_k^2 \|d_k\|^2)^2, \\ & = & \|a_k - a_\ast \|^2 - (2\gamma-\gamma^2) \sigma^2 t_k^4 \|d_k\|^4, \\ & = & \|a_k - a_\ast \|^2 - (2\gamma-\gamma^2) \sigma^2 \|a_k-z_k\|^4. \end{array} \end{equation}$

(3.5)

This result indicates that the sequence $\{\|a_k-a_*\|\}$ is monotonically decreasing, meaning that it consistently reduces as $k$ increases. Hence, the sequence $\{a_k\}$ is bounded.

By reorganizing the formula defined in (3.5), we obtain

$\begin{equation} \begin{array}{lll} (2\gamma-\gamma^2) \sigma^2 \sum\limits_{k = 0}^{\infty} \|a_k-z_k\|^4 & \leq & \sum\limits_{k = 0}^{\infty} \left(\|a_k - a_\ast \|^2 - \|a_{k+1} - a_\ast \|^2\right) \\ & \leq & \|a_0 - a_* \|. \end{array} \end{equation}$

(3.6)

This implies that $\lim\limits_{k \to \infty} \|z_k-a_k\| = 0.$ □

Theorem 1. Consider the sequence $\{\theta_k\}$ generated by the ILR algorithm. Then, the following conclusion is satisfied:

$\begin{equation} \lim\limits_{k \to \infty} \| \theta_k \| = 0. \end{equation}$

(3.7)

Proof. To demonstrate the desired result, we begin by assuming the contrary. Suppose that there exists a constant $A_1 > 0$ such that $\|\theta_k\| > A_1$ for all $k\geq0$ . This assumption, combined with Lemma 2, gives us the following relation $\|d_k\| \geq M \|\theta_k\| > M \ A_1$ for all $k\geq0$ . Given the continuity of the function $\theta(a)$ and the boundedness of the sequence $\{a_k\}$ , it follows that the sequence $\{\theta_k\}$ is also bounded. In other words, there exists a non-negative constant $A_2$ such that $\|\theta_k\| \leq A_2$ for all $k \geq 0$. By incorporating this bound with Lemma 2, we obtain $\|d_k\| \leq N\|\theta_k\| \leq N A_2$ for all $k\geq0$ . The two inequalities derived above imply that the sequence $\{d_k\}$ is bounded. Together with Lemma 4(ii) and the definition of $z_k$ , we have $\lim_{k \to \infty} \|z_k-a_k\| = \lim_{k \to \infty} \|a_k + t_k d_k-a_k\| = \lim_{k \to \infty} t_k \| d_k\| = 0$ , which leads to the conclusion that $\lim_{k \to \infty} t_k = 0$ with the boundedness of the sequence $\{d_k\}$ .

Since the sequences $\{a_k\}$ and $\{d_k\}$ are both bounded, we can extract two convergent subsequences, $\{a_{k_i}\}$ and $\{d_{k_i}\}$ , such that $\lim_{i \to \infty, i \in \mathcal{K}} a_{k_i} = \bar{a}$ and $\lim_{i \to \infty, i \in \mathcal{K}} d_{k_i} = \bar{d}$ , where $\mathcal{K}$ denotes an infinite index set. Utilizing Lemma 1, we have $-\langle \theta_{k_i}, d_{k_i} \rangle \geq M \|\theta_{k_i}\|^2.$ Taking the limit as $i \to \infty$ in the above inequality and invoking the continuity of $\theta(a)$ , we obtain

$-\langle \theta(\bar{a}), \bar{d} \rangle \geq M\|\theta(\bar{a})\|^2 \geq M A_1^2 > 0.$

Furthermore, we adopt the line search approach defined in (2.7), which implies the following inequality holds: $-\langle \theta(a_{k_i} + (\eta\rho)^{-1}t_{k_i}d_{k_i}), d_{k_i} \rangle < \sigma \eta (\eta\rho)^{-1} t_{k_i}\|\theta(a_{k_i} + (\eta\rho)^{-1} t_{k_i}d_{k_i})\|\|d_{k_i}\|^2$ . Taking the limit as $i \to \infty$ in the above inequality, and using the continuity of $\theta(a)$ , we conclude

$-\langle \theta(\bar{a}), \bar{d} \rangle \leq 0.$

These two results directly contradict each other. Therefore, the assumption that $\|\theta_k\| > A_1$ for all $k \geq 0$ must be false, and the desired result follows. □

4. Numerical experiments for systems of nonlinear equations

In this section, we evaluate the effectiveness of the proposed ILR algorithm through a comprehensive set of numerical experiments. These experiments are designed to solve large-scale systems of nonlinear equations with convex constraints, thereby assessing the algorithm's computational efficiency. For benchmarking purposes, we compare the ILR algorithm with two established methods (e.g., VRMILP and DFPRPMHS) across various test problems, initial points, and dimensional settings.

4.1. General setups

In this section, we utilize the ILR algorithm to address large-scale systems of nonlinear equations with convex constraints. We then compare it with two existing algorithms: the VRMILP algorithm ^[26] and the DFPRPMHS algorithm ^[27]. All experimental codes are executed on a 64-bit Ubuntu 20.04.2 LTS operating system with an Intel(R) Xeon(R) Gold 5115 2.40GHz CPU. The parameters for the ILR algorithm are set as follows:

$\mu = 0.02, \quad \tilde{\nu} = 0.105, \quad \eta = 1, \quad \sigma = 10^{-4}, \quad \rho = 0.74, \quad \tau = 10^{-5}.$

For the VRMILP and DFPRPMHS algorithms, we adhere to the parameter settings provided in their respective original works. Seven test problems are selected for evaluation, with problem dimensions set at {5,000 10,000 50,000 100,000 150,000}. Each test problem is initialized by the following points: $a_1 = \left(\frac{1}{2}, \frac{1}{2^2}, \ldots, \frac{1}{2^n}\right)$ , $a_2 = \left(0, \frac{1}{n}, \frac{2}{n}, \ldots, \frac{n-1}{n}\right)$ , $a_3 = (1, \frac{1}{2}, \ldots, \frac{1}{n})$ , $a_4 = \left(\frac{1}{n}, \frac{2}{n}, \ldots, \frac{n}{n}\right)$ , $a_5 = \left(\frac{1}{3}, \frac{1}{3^2}, \ldots, \frac{1}{3^n}\right)$ , $a_6 = (2, 2, \ldots, 2)$ $a_7 = \left(1-\frac{1}{n}, 1-\frac{2}{n}, \ldots, 1-\frac{n}{n}\right)$ , $a_8 \in [0, 1]^n$ . The stopping criteria for all algorithms is set to either $\theta_k \leq \tau$ or a maximum of 3000 iterations. Here, $\theta(a) = (\theta_1(a), \theta_2(a), \ldots, \theta_n(a))^\text{T}$ with $a = (a_1, a_2, \ldots, a_n)^\text{T}$ . The seven test problems are described as follows:

Problem 1 ^[7]:

$\theta_i(a) = e^{a_i}-1,\; \; \; \text{for}\; \; i = 1,2,\ldots,n,$

with the constraint set $\Theta = \mathbb{R}^n_+$ . The unique solution is $a_* = (0, 0, \ldots, 0)^\text{T}$ .

Problem 2 ^[7]:

$\theta_i(a) = \frac{i}{n}e^{a_i}-1,\; \; \; \text{for}\; \; i = 1,2,\ldots,n,$