1.
Introduction
In this paper, we study the split feasibility problem (SFP) which is defined on two nonempty closed and convex subsets C and Q of real Hilbert space H1 and H2, respectively when A:H1→H2 is a bounded linear operator. The problem SFP is to
if such μ∗ exists. The set Ω:={μ∗∈C:Aμ∗∈Q} is denoted for the solution set of the problem SFP (1.1).
In 1994, Censor and Elfving [8] first introduced the algorithm for solving the problem SFP (1.1). The existence of the inverse of the operator A−1 need to be required for computing of each iteration. After that many mathematicians (see in [3,9,10,14,34,37]) applied the problem SFP (1.1) to solve real world problems such as signal and image processing, automatic control systems, machine learning, and many more.
Byrne [7] was the first to propose a popular CQ algorithm solving SFP (1.1) which generates a sequence {μn} by the recursive procedure,
where λ belongs in the open interval (0,2‖A‖2) with PC and PQ are the projections matric onto C and Q, respectively. Another one of the famous algorithms in convex minimization problems is known that the gradient projection algorithm (GPA), this algorithm was generated as follow:
where f:H1→(−∞,+∞] is a lower semicontinuous convex function, λn the stepsize at iteration n is chosen in the interval (0,2L), where L is the Lipschitz constant of ▽f. It is well known that the algorithm GPA (1.3) can be reduced to solve the problem SFP (1.1) when setting f(μ):=12‖(I−PQ)Aμ‖2 with ∇f(μ)=AT(I−PQ)Aμ. The Lipschitz condition was required for the step size λn of the algorithms (1.2) and (1.3), that is λn∈(0,2‖A‖2). This means that to compute the CQ algorithm, the matrix norm of A needs to be found, which is generally not easy work in practice.
Later on, Byrne [7] presented a different step size {λn} without matrix norms computing. Also, Yang [41] was interested in using a step size {λn} that has no connection with matrix norms, the algorithm GPA (1.3) was considered for variational inequality problem. After that, many different stepsizes {λn} have been presented by many mathematicians, see in [22,35,36,41]
Another one of the different stepsizes was presented in 2018 by Pham et al. [2], this stepsize is generated as follow:
where
The algorithm (1.2) with the stepsize (1.4) was used to solve the problem SFP (1.1). For recent results on the problem SFP with the stepsize (1.4), see [13,19,23,38,43].
Finding a way to make algorithms converge faster is another approach many authors are interested in studying. The inertial technique is one way of solving the smooth convex minimization problem, which was first proposed by Polyak [27] in 1964. Polyak's algorithm was called the heavy ball method, modified from the two-step iterative method. The next iterate is defined by making use of the previous two iterates. Later on, the heavy ball method was improved by Nesterov [25] to speed up the rate of convergence. It is denotable that the inertial terminology dramatically improves the algorithm's performance and has nice convergence properties (see [10]). Since that, the heavy ball method has been widely used to solve a wide variety of problems in the optimization field, as seen in [12,24,30,33].
In 2020, Sahu et al. [28] proposed an inertial relaxed CQ algorithm {μn} for solving the problem SFP (1.1) in a real Hilbert space by combining the inertial technique of Alvarez and Attouch [1] with the Byrne algorithm (1.2). This algorithm was generated as follows:
where the stepsize parameter λ is still in the interval involving the norm of operator A and the extrapolation factor σn∈[0,ˉσn] and σ∈[0,1) such that
The weakly convergence of sequence {μn} generated by (1.5) was proved under the conditions of the extrapolation factor (1.6) and the stepsize parameter λ.
The study of the development of inertial techniques received significant attention. Subsequently, Beck and Teboulle [5] introduced the well-known fast iterative shrinkage-thresholding algorithm (FISTA). The algorithm is designed by choosing t1=1, λ>0 and compute
FISTA has received a lot of attention because of its excellent computational results. Many mathematicians have used its implementation in many problem applications (see [21] and reference therein). This inertial technique is limited in the computation of the {σn} sequence.
With the limit of choosing parameter σn of Beck and Teboulle [5], Gibali et al. [17] modified the following the inertial relaxed CQ algorithm (IRCQA) in a real Hilbert space. This algorithm is generated as follows:
They proved that, if λn=τnfn(μn)η2n, where ηn=max{1,‖▽fn(μn)‖} and σn⊂[0,ˉσn], where
such that ∑∞n=0σn‖μn−μn−1‖2<∞, then the sequence {μn} generated by (1.8) converges weakly to an element in a solution set of the problem SFP (1.1). The advantage of the IRCQA (1.8) is the extrapolation factor {σn} can be chosen in many ways under the control condition (1.9), and the stepsize parameter {λn} was built without the matrix norm.
In this paper, we propose an inertial Mann relaxed CQ algorithms to solve the split feasibility problems in Hilbert spaces. Our work is inspired by iterative methods developed Dang et al. [10], and Gibali et al. [17]. We apply our main result to solve a data classification problem in machine learning and then compare the performance of our algorithm with FISTA and IRCQA.
2.
Preliminaries
Let H1 and H2 be real Hilbert spaces. The strong (weak) convergence of a sequence {μn} to μ is denoted by μn→μ (μn⇀μ), respectively. Given a bounded linear operator A:H1→H2, ATdenotes the adjoint of A. For any sequence {μn}⊂H1, ωn(μn) denotes the weak w-limit set of {μn}, that is,
Let C be a nonempty closed and convex subset of a real Hilbert space H1. The metric projection from H1 onto C is defined by for each μ∈H1, there exists a unique x∗∈C such that
x∗ is called the metric projection from H1 onto C and denoted by PCμ.
Lemma 2.1. [35] \it Let f:H1→R be a function defined by
Then following assertions hold:
(i) f is convex and differentiable;
(ii) f is weakly lower semicontinuous on H1;
(iii) ∇f(μ)=AT(I−PQ)Aμ for all μ∈H1;
(iv) ∇f is 1‖A‖2 inverse strongly monotone, that is,
Lemma 2.2. [1] Let {κn}, {δn} and {αn} be the sequences in [0,+∞) such that κn+1≤κn+αn(κn−κn−1)+δn for all n≥1, ∑∞n=1δn<+∞ and there exists a real number α with 0≤αn≤α<1 for all n≥1. Then the followings hold:
(i) ∑n≥1[κn−κn−1]+<+∞, where [t]+=max{t,0};
(ii) There exists κ∗∈[0,+∞) such that limn→+∞κn=κ∗.
Lemma 2.3. [40] Consider the problem SFP (1.1) with the function f as in Lemma 2.1 and let λ>0 and μ∗∈H1. The point μ∗ solve the problem SFP (1.1) if and only if the point μ∗ solve the fixed point equation:
Lemma 2.4. [26] Let {μn} be a sequence in a real Hilbert H1 such that there exists a nonempty closed and convex subset Ω of H1 satisfying:
limn→∞‖μn−μ‖ exists for all μ∈Ω and any weak cluster point of {μn} belongs to Ω.
Then there exists μ∗∈Ω such that μn⇀μ∗.
Lemma 2.5. [32] Let X be a Banach space satisfying Opial's condition and let {μn} be a sequence in X. Let u,v∈X be such that
If {μnk} and {μmk} are subsequences of {μn} which converge weakly to u and v, respectively, then u=v.
3.
Main results
In this section, we introduce an inertial Mann relaxed CQ algorithm for solving the SFP (1.1). Let C and Q be a nonempty closed and convex subsets of a real Hilbert spaces H1 and H2, respectively, such that
where c:H1→R and q:H2→R are lower semi-continuous convex functions. We also assume that ∂c and ∂q are bounded operators. For a sequence {νn} in H1, we define the half-spaces Cn and Qn as follow:
where un∈ ∂c(νn), and
where vn∈∂q(Aνn) and A:H1→H2 is bounded linear operator. We see that C⊆Cn and Q⊆Qn for each n≥1. Define
Hence, we have
Our algorithm is defined as follows:
Assume that the following conditions hold:
Lemma 3.1. Let {μn} be the sequence generated by Algorithm 3.1. Assume that the conditions (3.7)–(3.9) hold. Then we have the following conclusions:
(i) ⟨∇fn(νn),νn−μ∗⟩≥2fn(νn) for all μ∗∈Ω and n∈N.
(ii) ‖μn+1−μ∗‖2≤‖νn−μ∗‖2−4λnαn(1−12λn‖A‖2)fn(νn)) for all μ∗∈Ω.
(iii) If limn→∞‖μn−μ∗‖ exists and ∑∞n=1[‖μn−μ∗‖2−‖μn−1−μ∗‖2]+<∞ for all μ∗∈Ω then we have
(a) {μn}, {νn} and {∇fn(νn)} are bounded,
(b) ‖μn+1−μn‖→0.
Proof. (i) Let μ∗∈Ω and A∗ is adjoint operator of A. Since C⊆Cn and Q⊆Qn, μ∗=PC(μ∗)=PCn(μ∗) and (I−PQ)(Aμ∗)=(I−PQn)(Aμ∗)=0. From (I−PQn) is firmly nonexpansive, for each n∈N, we have
(ii) Let μ∗∈Ω. Set tn=νn−λn∇fn(νn), we have
From part (i), we get
(iii) Let μ∗∈Ω. Suppose that limn→∞‖μn−μ∗‖ exists, (3.7) holds and ∑∞n=1[‖μn−μ∗‖2−‖μn−1−μ∗‖2]+<∞, we have
On the other hand, for each n∈N,
From (3.11) and (3.12), we have
Since {μn} is bounded, it follows from (3.12) that {νn} is also bounded. Since ∇fn is ‖A‖2-Lipschitz, we have
Hence {∇fn(νn)} is also bounded.
Since λ∈(0,2‖A‖2), we have
This implies that
If follows from limn→∞‖μn−μ∗‖ exists and (3.7) that
We next show that ‖μn+1−μn‖→0. It follows from (3.14), we have
By ∑∞n=1[‖μn−z‖2−‖μn−1−z‖2]+<∞ and ∑∞n=1σn‖μn−μn−1‖2<∞, it follows from (3.13) and (3.15) that
On the other hand, from (3.7), we have
From (3.16) and (3.17), we get
This completes the proof. □
Let {μn} be sequence which defined by Algorithm 3.1. We next prove that there exists a subsequence {μnj} of the sequence {μn} converges weakly to a solution of the problem SFP (1.1).
Theorem 3.2. Let H1 and H2 be two real Hilbert spaces, and let C and Q be nonempty closed convex subsets of H1 and H2, respectively. Let A:H1→H2 be a bounded linear operator. Assume that the solution set Ω of the problem SFP (1.1) is nonempty, the condition (3.7) holds, and {λn}, {αn} satisfies the condition (3.8). Let {μn} be a sequence generated by Algorithm 3.1. Then we have the following:
(i) {μn},{νn} and ∇fn(νn) are bounded;
(ii) There exists a subsequence {μnj} of {μn} converging weakly to a point μ∗∈Ω;
(iii) The sequence {μn} converges weakly to a point μ∗∈Ω.
Proof. Let μ∗∈Ω. From Lemma 3.1 (ii), there exists m∈N such that
From (3.12) and (3.18), we have
Since 4λnαn(1−12λn‖A‖2)fn(νn)≥0, from (3.19), we have
which implies that, for each n≥m,
From (3.10), we have
Applying Lemma 2.2 of [1] in (3.21) with the data ψn=‖μn−μ∗‖2, δn=2σn‖μn−μn−1‖2, we obtain that limn→∞‖μn−μ∗‖ exists and ∑∞n≥m[‖μn−μ∗‖2−‖μn−1−μ∗‖2]+<∞. This leads, from Lemma 3.1 (iii) that {μn},{νn} and {∇fn(νn)} are bounded. Since {∇fn(νn)} is bounded. It follows from (3.22) and conditions (3.7)–(3.9) that
Since {νnj} is bounded, there exists a subsequence {νnjm} of {νnj} which converges weakly to μ∗. Since PQnjAνnj∈Qnj, we have
where vnj∈∂q(Aνnj). Since ∂q is bounded, then {vnj} is also bounded. From (3.24), we have
It follow from the assumption of q that
which means that Aμ∗∈Q. By Lemma 3.1 (iii), we have
Note that znj∈Cnj. By the definition of Cnj, we get
where unj∈∂c(νnj). Since ∂c is bounded, we see that {unj} is bounded. From (3.14), we have
Similarly, we obtain that c(μ∗)≤0, i.e., μ∗∈C. From 3.17. Therefore, μnj⇀μ∗∈Ω.
Since {μn} is bounded and H is reflexive, ωω(μn) is nonempty. Let p∈ωω(μn) be an arbitrary element. Then there exists a subsequence {μnk} of {μn} such that μnk⇀p. Let q∈ωω(μn) and {μni}⊆{μn} be such that μni⇀q. From (ii), we have p,q∈Ω. By Lemma 2.5, p=q. Applying Lemma 2.4 and Lemma 3.1 (iii), there exists μ∗∈Ω such that μn⇀μ∗. □
For the convergence of Algorithm 3.1, we see that the parameter {λn} needs to satisfy the Lipschitz condition that is λn∈(0,2‖A‖2). So, Algorithm 3.1 is flexible to use by choosing the parameter {λn}. For example, applying the stepsize (1.6) and (1.9) of Dang et al. [10] and Gibali et al. [17], respectively, we present a new update step size in the following Algorithm 3.3 and Algorithm 3.4:
Remark 3.1. From Algorithms 3.3 and 3.4, it's easy to see that the stepsize λn is a nonincreasing sequence in (0,2‖A‖2) and satisfies the condition (3.8).
4.
Application to data classification problem
Currently, cardiovascular disease is the leading cause of death. World Health Organization (WHO) reported 17.9 million human deaths caused by cardiovascular diseases in the year 2019 that was estimated to be 32% the year 2019 [39]. In Thailand [11] cardiovascular disease is the number 1 cause of death for Thai people and increases in all age groups. Therefore, monitoring the heart condition at regular intervals and tracing out the problem at an earlier stage is the need to control the life-threatening situation due to heart failure. To predict heart disease, we used the UCI Machine Learning Heart Disease dataset, which is available on the Internet at [15], was used to evaluate the proposed model. The dataset comprises 76 characteristics and 303 records. However, only 14 attributes from the dataset were used for training and testing. This dataset contains the various attributes are Age, Gender, CP, Trestbps, Chol, Fbs, Restecg, Thalach, Exang, Oldpeak, Slope, Ca, Thal and Num (target variable). The dataset consists of 138 normal instances versus 165 abnormal instances. The following Table 1 shows visualization of the dataset.
In 2021, Bharti et al. [6] presented the comparison of different machine learning algorithms of the UCI Machine Learning Heart Disease dataset with feature selections and normalization for getting better accuracy. In this section, we shall apply our Algorithms 3.1, 3.3, and 3.4 to optimize weight parameter in training data for machine learning by using 5-fold cross-validation [20] in extreme learning machine (ELM). Very recently, Sarnmeta et al. [29] also considered the UCI Machine Learning Heart Disease dataset using an accelerated forward backward algorithm with linesearch technique for convex minimization problems in ELM with 10-fold cross-validation. The following Table 2 shows the efficiency of our algorithm in extreme learning machine by original dataset compare with the existing machine learning methods were presented in Bharti et al. [6] and ELM algorithm in Sarnmeta et al. [29].
For our machine learning classification process, we start at letting U:={(μs,rs):μs∈Rn,rs∈Rm,s=1,2,...,N} be a training set of N distinct samples where μs is an input training data and rs is a target data. The output function of ELM for single-hidden layer feed forward neural networks (SLFNs) [16,42] with M hidden nodes and activation function V is
where ci and ei are parameters of weight and finally the bias, respectively. To find the optimal output weight wi at the i-th hidden node, then the hidden layer output matrix A is generated as follows:
To solve ELM is to find optimal output weight w=[wT1,...,wTM]T such that Aw=T, where T=[rT1,...,rTN]T is the training target data. The least square problem is used for finding the solution of linear equation Aw=T in the cases of the Moore-Penrose generalized inverse of A may be not easy to compute when the matrix A† does not exist. To reduce overfitting of the model in training, we consider constrain least square problem in closed convex subsets C of H1 as follow:
where C={x∈H1:‖x‖1≤γ} such that γ is regularization parameters. For applying our inertial Mann relaxed CQ algorithm to solve the problem (4.1), we define f(μ):=12‖(I−PQ)Aμ‖2, ∀μ∈H1, and Q={T}, and let c(μ)=‖μ‖1−γ and q(μ)=12‖μ−T‖2.
The following four evaluation metrics: Accuracy, Precision, Recall, and F1-score [18] are considered for comparing the performance of the classification algorithms:
where TP: = True Positive, FN: = False Negative, TN: = True Negative and FP: = False Positive.
The binary cross-entropy loss function is the mean of a cross-entropy resulting from two probability distributions, the probability distribution we want versus the probability distribution estimated by the model. By computing the following average:
where ˆyi is the i-th scalar value in the model output, yi is the corresponding target value, and K is the number of scalar values in the model output.
We start computation by setting the activation function as sigmoid, hidden nodes M=100, regularization parameter λ=1×10−5 and αn=1n+1 for Algorithms 3.1, 3.3, and 3.4 with λn=0.92(max(eigenvalue(ATA))) for Algorithm 3.1, λ1=0.92(max(eigenvalue(ATA))), ρ1=ρ2=1.99 for Algorithm 3.3 and λ1=0.92(max(eigenvalue(ATA))) for Algorithm 3.4. The stopping criteria is the number of iteration 100. We compare the performance of the algorithm with different parameters ˉσn as seen in Table 3 when
where N is a number of iterations that we want to stop. We can see that parameters σn satisfies the condition in Algorithm 3.1, Algorithm 3.3, and Algorithm 3.4 all of each case of ˉσn in Table 3. We can see that ˉσn=213‖μn−μn−1‖3+n3+213 highly improves the performance of Algorithm 3.1, Algorithm 3.3, and Algorithm 3.4. We next choose it as the default inertial parameter for later our calculation.
By setting ˉσn=213‖μn−μn−1‖3+n3+213, αn=1n+1 for Algorithms 3.1, 3.3, and 3.4 with ρ1=ρ2=1.99 for Algorithm 3.3. The stopping criteria is the number of iteration 100. We obtain the results of the different parameters h when λn=h2(max(eigenvalue(ATA))) for Algorithm 3.1 and different parameters λ1 for Algorithm 3.3 and Algorithm 3.4 as seen in Table 4.
We can see that h=λ1=1.9999 highly improves the performance of Algorithm 3.1, Algorithm 3.3, and Algorithm 3.4. We next choose it as the default suitable step size for later our calculation.
Setting the inertial parameters ˉσn=213‖μn−μn−1‖3+n3+213, λn=1.99992(max(eigenvalue(ATA))) for Algorithm 3.1 and ˉσn=213‖μn−μn−1‖3+n3+213, λ1=1.99992(max(eigenvalue(ATA))) for Algorithm 3.3 and Algorithm 3.4 with ρ1=ρ2=1.99 for Algorithm 3.3. The comparison of all algorithms with different parameters αn are presented in Table 5.
We can see that αn=0.5 highly improves the performance of Algorithm 3.1, Algorithm 3.3, and Algorithm 3.4. Therefore, we choose it as the default parameter αn for later our calculation. We compare the performance of FISTA, IRCQA, and our algorithm. All the parameters are chosen as seen in Table 6.
For comparison, We set sigmoid as an activation function, number of hidden nodes M=100 and regularization parameter λ=1×10−5.
Table 7 shows that our algorithm is among those with the highest precision, recall, F1-score, and accuracy efficiency. Additionally, it has the lowest number of iterations. This means that it has the highest probability of correctly classifying heart disease compared to algorithms examinations. We next present the training and validation loss with the accuracy of training to show that our algorithm has good fit model in the training dataset.
From Figures 1–3, we can see that the Training Loss and Validation Loss values have decreased, where the Validation Loss value is lower than Training Loss. On the contrary, when we look at the Accuracy graph, we see that Training Accuracy and Validation Accuracy increase, where the Validation Accuracy is higher than Training Accuracy.
5.
Conclusions
This paper considers solving split feasibility problems using the inertial Mann relaxed CQ algorithms. Under some suitable conditions imposed on parameters, we have proved the weak convergence of the algorithm. Moreover, we present choosing different stepsize modifications to achieve an efficient algorithm. We show the efficiency of our algorithm by comparing it with different machine learning methods and also extreme learning machine with FISTA and IRCQA algorithms in data classification using the UCI Machine Learning Heart Disease dataset. The results show that our algorithms are better than the other algorithms.
Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
Acknowledgments
This research was supported by the NSRF via the program Management Unit for Human Re-sources & Institutional Development, Research and Innovation [grant number B05F640183] and Chiang Mai University. This research was also supported by National Research Council of Thailand (N42A650334) and Thailand Science Research and Innovation, the University of Phayao (Grant No. FF66-UoE).
Data availability
The dataset used in this research is publicly available at the UCI machine learning repository on https://archive.ics.uci.edu/ml/datasets/Heart+Disease.
Conflict of interest
The authors declare no conflicts of interest.