1.
Introduction
Let H be a real Hilbert space equipped with an inner product ⟨⋅,⋅⟩ and its induced norm ‖⋅‖. In this work, we deal with the minimization of a strongly convex smooth function over the common fixed-point constraints, which is in the following form:
where FixTi:={x∈H:Ti(x)=x} and the objective function and the constrained operators satisfy the following assumptions:
(A1) The function f:H→R is η-strongly convex and L-smooth.
(A2) The operators Ti:H→H, i=1,…,m, are cutters with m⋂i=1FixTi≠∅.
Thanks to the closedness and the convexity of the common fixed-point sets m⋂i=1FixTi and the strong convexity of f, we can ensure that the solution set to the problem (1.1) is a singleton set. In this situation, we denote the solution point to the problem (1.1) by x∗ throughout this work.
1.1. Related works
The minimization problem and its generalization (such as variational inequality problem) over the common fixed-point constraints have been investigated by many authors, in which we briefly note some related works as follows. Bauschke [1] considered the problem (1.1) in the case when the objective function is given by f(x):=12‖x−a‖2, where the point a∈H is a given point and the operators Ti,i=1,2,…,m are nonexpansive operators. He proposed the following method: Given x1∈H and a nonnegative sequence {βk}∞k=1, for all k∈N, compute
where the function [⋅] is the modulo m function taking values in {1,…,m}. Bauschke proved that the sequence {xk}∞k=1 generated by the proposed method (1.2) converges strongly to the unique solution of the considered problem, provided that the constrained set satisfies
and the sequence {βk}∞k=1⊂[0,1) satisfies lim, \sum_{k = 1}^\infty{{\beta _k}} = \infty , and \sum_{k = 1}^\infty|\beta_{k+m}-\beta_k| < \infty .
After that, Yamada [2] considered the solving of a more general setting of the problem (1.1) in the sense of variational inequality problem governed by the strongly monotone and Lipschitz continuous operator F: \mathcal{H}\to \mathcal{H} . Thanks to the optimality condition, it is well known that such a considered problem can be reduced to the problem (1.1) by setting F: = \nabla f . He proposed the so-called hybrid steepest descent method, which is in the following form: Given \mathbf{x}^1\in \mathcal{H} and a nonnegative sequence \{\beta_k\}_{k = 1}^\infty , for all k\in\mathbb{N} , compute
With the similar assumption on the constrained set (1.3) and the control sequence \{\beta_k\}_{k = 1}^\infty , the strong convergence of the proposed method (1.4) to the unique solution of the considered problem was obtained.
Xu and Kim [3] also investigated the problem in the same setting as in Yamada [2] and considered the convergence result of the method (1.4) by proposing a variant condition of the control sequence \{\beta_k\}_{k = 1}^\infty\subset(0, 1] , which satisties \mathop {\lim }_{n \to \infty }\frac{\beta_n}{\beta_{n+1}} = 1 in place of the condition \sum_{k = 1}^\infty|\beta_{k+m}-\beta_k| < \infty given in Yamada [2]. Another condition of the control sequence \{\beta_k\}_{k = 1}^\infty and the assumption on the constrained set (1.3) are also the same as above. After the works of Yamada [2] and Xu and Kim [3] were presented, many authors considered the variances and generalizations; see, for instance, [4,5,6,7,8,9].
Prangprakhon et al. [10] also considered the solving of the variational inequality problem over the intersection of fixed point sets of firmly nonexpansive operators. They presented an iterative algorithm, which can be viewed as a generalization of the hybrid steepest descent method in the allowance of adding appropriated information when computing operators values. The main feature of their proposed method is the presence of added information that can be viewed as the allowance of possible numerical errors on the computations of operators' values. They subsequently proved the strong convergence of the proposed method without the assumption on the constrained set (1.3).
Prangprakhon and Nimana [11] subsequently proposed the so-called extrapolated sequential constraint method with conjugate gradient direction (ESCoM-CGD) for solving the variational inequality problem over the intersection of the fixed-point sets of cutters. ESCoM-CGD is motivated by the ideas of the hybrid conjugate gradient method [12], which is an accelerated version of the hybrid steepest descent method together with the extrapolated cyclic cutter methods [13,14]. This method consists of two interesting features, namely, it consists of the extrapolation stepsize function \sigma(\mathbf{y}^k) (see Step 2 of Algorithm 1) and the search direction \mathbf{d}^{k+1} , which is the combination of the current direction -{\nabla f(\mathbf{x}^{k+1})} together with a previous search direction \mathbf{d}^k . By assumming the boundedness of the generated sequence \{ \mathbf{x}^k\}_{k = 1}^\infty together with the assumptions on control sequences, they also proved the strong convergence of the proposed method.
Very recently, Petrot et al. [15] proposed the so-called dynamic distributed three-term conjugate gradient method for solving the strongly monotone variational inequality problem over the intersection of fixed-point sets of firmly nonexpansive operators. Unlike [11], this method has a simultaneous structure so that it allows the independent computation of each operator along with the dynamic weight, which is updated at every iteration. Moreover, the strong convergence of the method was obtained without assuming that the sequence \{ \mathbf{x}^k\}_{k = 1}^\infty is bounded.
1.2. Our contribution
In this work, we will present a modification version of ESCoM-CGD by using the search direction considered in [15]. We will prove the convergence of the proposed method without assuming that the generated sequence is bounded. It is important to underline that we will simplify the proving lines compared to the proof of ESCoM-CGD given in [11, Theorem 3] in a shortened way. We also show that the proposed algorithm and convergence result are applicable in the binary classification via support vector machine learning.
The remainder of this work is organized as follows. In the rest of this section, we collect some mathematical preliminaries consisting of some useful definitions and facts needed in the proving lines. In Section 2, we present the modified version of ESCoM-CGD for finding the solution to the problem (1.1). The main convergence theorem and some important remarks are also given in this section. In Section 3, we provide some usefulness of the proposed method and its convergence result in solving the minimum-norm problem to the system of homogeneous linear inequalities, as well as the binary classification via support vector machine learning. In Section 4, we provide some technical convergence properties of the generated sequences and, subsequently, prove the main convergence theorem. Lastly, we give a concluding remark.
1.3. Mathematical preliminaries
In this subsection, we will recall some definitions, properties, and key tools in proving our convergence results. The readers can consult the books [16,17] for more details.
We denote by Id the identity operator on a Hilbert space \mathcal{H} . For a sequence \{ \mathbf{x}^k\}_{k = 1}^\infty , the strong and weak convergences of a sequence \{ { \mathbf{x}^k}\} _{k = 1}^\infty to a point \mathbf{x}\in \mathcal{H} are defined by the expression { \mathbf{x}^k} \to x and \mathbf{x}^k \rightharpoonup x , respectively.
We will recall the convexity and smoothness of a function. Now, let f: \mathcal{H}\to\mathbb{R} be a real-valued function. Let \alpha > 0 be given. We say that the function f is convex if, for all \mathbf{x}, \mathbf{y}\in \mathcal{H} and for all \lambda\in[0, 1] , it holds that
In particular, we say that the function f is \alpha -strongly convex if, for all \mathbf{x}, \mathbf{y}\in \mathcal{H} and for all \lambda\in[0, 1] , it holds that
The constant \alpha is called the strongly convex parameter.
Let \mathbf{x}\in\mathcal{H} be given. We say that the function f is Fréchet differentiable or, in short, differentiable at \mathbf{x} if there exists a vector \mathbf{g}\in\mathcal{H} such that
Moreover, we call this unique vector g by the gradient of f at \mathbf{x} and denote it by \nabla f(\mathbf{x}) .
It is well known that the convexity of f can be characterized in terms of differentiability property as the followings.
Fact 1.1. [16, Theorem 5.24] Let f: \mathcal{H}\to\mathbb{R} be a real-valued differentiable function, then f is \alpha -strongly convex if, and only if, for all \mathbf{x}, \mathbf{y}\in \mathcal{H} , we have
The following fact is the cornerstone of our proving lines. It is the necessary and sufficient condition for the optimality of a convex function over a nonempty closed and convex set.
Fact 1.2. [16, Corollary 3.68] Let f:\mathcal{H}\to \mathbb{R} be a continuously differentiable convex function and C\subset \mathcal{H} be a closed and convex set, then f attains its minimum at a point \mathbf{x}^*\in C if, and only if, for all \mathbf{y}\in C , it holds that
Let L\geq 0 be given. We say that the function f is L - smooth if it is differentiable and, for all \mathbf{x}, \mathbf{y} \in \mathcal{H} , it satisfies that
The constant L is called the smoothness parameter. It is clear that an L -smooth function is a continuously differentiable function.
The following fact is a very useful tool in obtaining the convergence result. The proof can be found in [2, Lemma 3.1(b)].
Fact 1.3. Let f:\mathcal{H}\to \mathbb{R} be an \eta -strongly convex and L -smooth. For any \mu\in(0, {2\eta}/{L^2}) and \beta\in(0, 1] , if we define the operator F^{\beta}:\mathcal{H}\to\mathcal{H} by F^{\beta}(\mathbf{x}): = \mathbf{x}-\mu\beta\nabla f(\mathbf{x}) for all \mathbf{x}\in \mathcal{H} , then
for all \mathbf{x}, \mathbf{y}\in\mathcal{H} , where \tau: = 1-\sqrt{1+{\mu}^2{L}^2-2\mu\eta}\in(0, 1] .
Next, we recall the definition of cutters and some useful properties.
Let T: \mathcal{H}\to \mathcal{H} be an operator with {\rm{Fix}}\; T: = \{ \mathbf{x}\in \mathcal{H}:T(\mathbf{x}) = \mathbf{x}\}\neq\emptyset . We say the operator T is a cutter if, for all \mathbf{x}\in \mathcal{H} and \mathbf{z}\in{\rm{Fix}}\; T , we have
The followings are the important properties of cutters.
Fact 1.4. [17, Remark 2.1.31 and Lemma 2.1.36.] Let T: \mathcal{H}\to \mathcal{H} be a cutter, then the following statements hold:
(i) {\rm{Fix}}\; T is closed and convex.
(ii) For all \mathbf{x}\in \mathcal{H} and for all \mathbf{z}\in{\rm{Fix}}\; T , we have \langle T \mathbf{x} - \mathbf{x}, \mathbf{z} - \mathbf{x}\rangle\ge\|T(\mathbf{x}) - \mathbf{x}{\|^2}.
Next, we recall the definitions of relaxations of an operator. Let T: \mathcal{H} \to \mathcal{H} be an operator and \lambda\in[0, 2] be given. The relaxation of the operator T is defined by T_\lambda(\mathbf{x}): = (1-\lambda) \mathbf{x}+\lambda T(\mathbf{x}), for all \mathbf{x}\in \mathcal{H} . In this case, we call the parameter \lambda by a relaxation parameter. Moreover, let \sigma: \mathcal{H}\to(0, \infty) be a function. The generalized relaxation of the operator T is defined by
for all \mathbf{x}\in \mathcal{H} . In this case, we call the function \sigma by a stepsize function. If the function \sigma (\mathbf{x}) \geq 1 for all \mathbf{x}\in \mathcal{H} , then we call the operator {T_{\sigma, \lambda}} by an extrapolation of T_\lambda . We denote T_{\sigma}: = T_{\sigma, 1} . It is worth noting that, for all \mathbf{x}\in \mathcal{H} , the followings hold
and for all \lambda\neq0 , we also have
For the simplicity, we denote the compositions of operators as
and
We recall the important properties that will be useful for the convergence properties.
Fact 1.5. [17, Section 4.10] Let T_i:\mathcal{H}\to\mathcal{H} , i = 1, 2, \ldots, m , be cutters with \bigcap\limits_{i = 1}^{m}{\rm{Fix}}\;{T_i}\neq\emptyset . Let \sigma : \mathcal{H}\to(0, \infty) be defined by
then the following properties hold:
(i) For all \mathbf{x}\notin\bigcap_{i = 1}^{m}{\rm{Fix}}\;{T_i} , we have
(ii) The operator T_{\sigma} is a cutter.
Next, we will recall the notion of demi-closedness principle as follows.
Let T: \mathcal{H} \to \mathcal{H} be an operator having a fixed point. The operator T is said to satisfy the demi-closedness principle if, for every sequence \{ \mathbf{x}^k\}\subset \mathcal{H} such that \mathbf{x}^k\rightharpoonup \mathbf{x}\in \mathcal{H} and \|T(\mathbf{x}^k)- \mathbf{x}^k\| \to 0 , we have \mathbf{x}\in {\rm{Fix}}\; T .
We close this section by recalling the well-known metric projection, which is defined in the following: Let C be a nonempty subset of \mathcal{H} and \mathbf{x}\in \mathcal{H} . If there is a point \mathbf{y}\in C such that \| \mathbf{x} - \mathbf{y}\| \leq \| \mathbf{x} - \mathbf{z}\|, for any \mathbf{z}\in C , then \mathbf{y} is said to be a metric projection of \mathbf{x} onto C and is denoted by P_C(\mathbf{x}) . If P_C(\mathbf{x}) exists and is uniquely determined for all \mathbf{x}\in \mathcal{H} , then the operator P_C: \mathcal{H}\to \mathcal{H} is said to be the metric projection onto C . Actually, we need some additional properties of the set C to ensure the existence and the uniqueness of a metric projection P_C(\mathbf{x}) for a point \mathbf{x}\in \mathcal{H} as the following fact.
Fact 1.6. [17, Theorem 1.2.3.] Let C be a nonempty closed convex subset of \mathcal{H} , then for any \mathbf{x}\in \mathcal{H} , there exists a metric projection P_C(\mathbf{x}) and it is uniquely determined.
Moreover, we finally note that the metric projection onto a nonempty closed convex set C is a cutter with {\rm{Fix}}\; P_{C} = C ; see [17, Theorem 2.2.21.] for further details.
2.
Algorithm and its convergence
We will proceed in this section by presenting a modified version of the extrapolated sequential constraint method with conjugate gradient direction (ESCoM-CGD) for solving the problem (1.1). Now, we are in a position to state a modified version of ESCoM-CGD as the following algorithm. We call the proposed method by the modified extrapolated sequential constraint method with conjugate gradient direction (in short, MESCoM-CGD).
Some comments and particular situations of MESCoM-CGD are the following:
Remark 2.1. (ⅰ) As we have mentioned that the convergence of ESCoM-CGD (see, [11, Theorem 3]) relies on the assumption that the generated sequence \{ \mathbf{x}^k\}_{k = 1}^\infty is bounded, however, it is also pointed in [11, Remark 3] that the boundednesses of the search direction \{ \mathbf{d}^k\}_{k = 1}^\infty\in \mathcal{H} and the sequence \{\varphi_{k}\}_{k = 1}^{\infty} together with the definition of the search direction yield the boundedness of the sequence \{ \mathbf{x}^k\}_{k = 1}^\infty . In this situation, if we construct the bounded search direction \left\{\frac{ \mathbf{d}^{k}}{\max\{1, \| \mathbf{d}^{k}\|\}}\right\}_{k = 1}^\infty\in \mathcal{H} in place of the previous version, then the boundedness of the generated sequence \{ \mathbf{x}^k\}_{k = 1}^\infty will be provable as well (see, Lemma 3.3 below). One can notice that the key idea of this bounded strategy is nothing else but restricting the search direction in a unit ball centered at the origin.
(ⅱ) If the search direction \{ \mathbf{d}^k\}_{k = 1}^\infty is guaranteed to stay within the unit ball centered at the origin, then it holds that \max\{1, \| \mathbf{d}^{k}\|\} = 1 so that MESCoM-CGD is reduced to the ESCoM-CGD in the case when the operator T_m = Id .
(ⅲ) If the number of m = 1 , the relaxation parameter \lambda_{k} = 1 , the stepsize \sigma(\mathbf{y}^{k}) = 1 , and \max\{1, \| \mathbf{d}^{k}\|\} = 1 for all n\in \mathbb{N} , then MESCoM-CGD is nothing else but the hybrid conjugate gradient method proposed in [12].
Now, let us begin this section by investigating the assumptions needed for the convergence result.
Assumption 2.2. Assume that
(H1) The sequence of stepsizes \{\beta_{k}\}_{k = 1}^{\infty}\subset (0, 1] satisfies \sum\limits_{k = 1}^\infty\beta_{k} = \infty and \sum\limits_{k = 1}^\infty\beta_{k}^2 < \infty .
(H2) The sequence of parameters \{\varphi_{k}\}_{k = 1}^{\infty}\subset [0, \infty) satisfies \lim\limits_{k\to\infty}\varphi_{k} = 0 .
(H3) There is a constant \varepsilon\in(0, 1) such that the relaxation parameters \{\lambda_{k}\}_{k = 1}^{\infty}\subset[\varepsilon, 2-\varepsilon] .
(H4) The operators T_i , i = 1, \ldots, m , satisfy the demi-closedness principle.
Let us briefly discuss some particular situations in which hypotheses (H1)–(H4) hold as follows:
Remark 2.3. (ⅰ) An example of the stepsizes \{\beta_{k}\}_{k = 1}^{\infty} satisfying (H1) is, for instance, \beta_k = \frac{\beta_0}{(k+1)^b} with \beta_0\in(0, 1] and b\in(0, 1] for all k\in \mathbb{N} .
(ⅱ) An example of the parameters \{\varphi_{k}\}_{k = 1}^{\infty} satisfying (H2) is, for instance, \varphi_k = \frac{\varphi_0}{(k+1)^c} with \varphi_0\geq0 and c > 0 for all k\in \mathbb{N} .
(ⅲ) The role of the constant \varepsilon\in(0, 1) is to ensure that the term \lambda_{k}(2-\lambda_{k}) stays away from zero for all k\in \mathbb{N} . This will yield that the cancellation of this term can be processed. One can take, for instance, \lambda_k = \lambda_0 for some \lambda_0\in(0, 2) as a trivial example of relaxation parameters \{\lambda_{k}\}_{k = 1}^{\infty} , satisfying hypothesis (H3).
(ⅳ) The demi-closedness principle of the operators T_i , i = 1 = , \ldots, m , in (H4) is a key property and it is typically assumed when dealing with the convergence result of the common fixed-point type problems. Actually, the demi-closedness principle is satisfied by a nonexpansive operator, i.e., \|T_i(\mathbf{x})-T_i(\mathbf{y})\|\leq\| \mathbf{x}- \mathbf{y}\| for all \mathbf{x}, \mathbf{y}\in \mathcal{H} ; see [17, Lemma 3.2.5.] for the proving lines with some historical notes. This yields that the metric projection onto a nonempty closed and convex also satisfies the demi-closedness principle, see [17, Theorem 2.2.21]. Moreover, it is worth mentioning that the demi-closedness principle is also satisfied by a subgradient projection operator P_f for a convex continuous function f: \mathcal{H}\to \mathbb{R} , which is Lipschitz continuous relative to every bounded subset of \mathcal{H} ; see [17, Theorem 4.2.7] for more details.
The main result of this work is the following theorem:
Theorem 2.4. Let the sequence \{ \mathbf{x}^k\}_{k = 1}^\infty be generated by MESCoM-CGD and assume that assumptions (A1) and (A2) and hypotheses (H1)–(H4) hold, then the sequence \{ \mathbf{x}^k\}_{k = 1}^\infty converges strongly to the unique solution \mathbf{x}^{*} of the problem (1.1).
3.
Proof of Theorem 2.4
In this section, we will provide some technical convergence properties and close this section by proving the convergence of the proposed method to the unique solution of the problem (1.1).
For simplicity of notations, we denote
and, for every n\in\mathbb{N} and \mathbf{x}\in \bigcap_{i = 1}^{m}{\rm{Fix}}\;{T_{i}} , we denote
and
Moreover, for every k\geq{2} , we denote
In order to prove the convergence result in Theorem 2.4, we collect several facts that will be useful in what follows. We first state the lemma relating to the norms of iterate \mathbf{x}^k to a common fixed-point. The analysis is similar to those given in [11, Lemma 4], however, we state here again for the sake of completeness.
Lemma 3.1. For all k\in\mathbb{N} and \mathbf{x}\in \bigcap\limits_{i = 1}^{m}{\rm{Fix}}\;{T_{i}} , we have
Proof. Let k\in\mathbb{N} and \mathbf{x}\in \bigcap\limits_{i = 1}^{m}{\rm{Fix}}\;{T_{i}} be given. By using the properties of extrapolation and Facts 1.4 (ⅱ) and 1.5(ⅱ), we note that
We note from the definition of \{ \mathbf{y}^k\}_{k = 1}^\infty that
By invoking this relation in (3.1), we obtain
as required. □
In order to prove the boundedness of the generated sequence \{ \mathbf{x}^k\}_{k = 1}^\infty , we need the following proposition [18, Lemma 3.1].
Proposition 3.2. Let \{a_{k}\}_{k = 1}^\infty be a sequence of nonnegative real numbers such that there exists a subsequence \{a_{k_j}\}_{j = 1}^\infty of \{a_{k}\}_{k = 1}^\infty with a_{k_j} < a_{k_j+1} for all j\in\mathbb{N} . If, for all k\geq{k_0} , we define
then the sequence \{v(k)\}_{k\geq{k_0}} is nondecreasing, a_{v(k)}\le{a_{v(k)+1}} , and a_{k}\le{a_{v(k)+1}} for every k\geq{k_0} .
Now, we are in a position to prove the boundedness property of the generated sequence \{ \mathbf{x}^k\}_{k = 1}^\infty as the following lemma. The idea proof is based on [15, Lemma 5].
Lemma 3.3. The sequence \{ \mathbf{x}^k\}_{k = 1}^\infty is a bounded sequence.
Proof. Let k\in\mathbb{N} and \mathbf{x}\in \bigcap\limits_{i = 1}^{m}{\rm{Fix}}\;{T_{i}} be given. Since \lambda_{k}\in[\varepsilon, 2-\varepsilon] , we have {\varepsilon}^{2}\le\lambda_{k}(2-\lambda_{k}) , and so
Thus, by using this property together with the inequality obtained in Lemma 3.1, we get
For the sake of simplicity, we set
for all k\in\mathbb{N} . In this case, the inequality (3.2) can be rewritten as
Next, we divide the proof into two cases based on the behaviors of the sequence \{\Gamma_{k}\}_{k = 1}^\infty :
Case Ⅰ: Suppose that there exists k_{0}\in\mathbb{N} such that \Gamma_{k+1}\leq \Gamma_k for all k\geq{k_0} . In this case, we have \Gamma_{k}\leq \Gamma_{k_0} for all k\geq{k_0} , which is
for all k\geq{k_0} . Since the series \sum_{k = 1}^\infty\beta_{k}^2 converges, we obtain that the sequence \{\| \mathbf{x}^k- \mathbf{x}\|^2\}_{k = 1}^\infty is bounded and, subsequently, the sequence \{ \mathbf{x}^k\}_{k = 1}^\infty is also bounded.
Case Ⅱ: Suppose that there exists a subsequence \{\Gamma_{k_j}\}_{j = 1}^\infty of \{\Gamma_{k}\}_{k = 1}^\infty such that \Gamma_{k_j} < \Gamma_{k_j+1} for all k\in\mathbb{N} , and let \{v(k)\}_{k = 1}^\infty be defined as in Proposition 3.2. This yields, for all k\geq{k_0} , that
and
By using the relation (3.4) in the inequality (3.3) and the fact that the term \frac{2\mu\beta_k}{\max\{1, \| \mathbf{d}^k\|\}} is positive, we have
Now, let us note that
and, since 0\leq\varphi_{k}\leq 1 , we have
Since f is \eta -strongly convex, we have from Fact 1.1 that
where the last inequality holds by the inequality (3.6). Thus, we obtain that
which implies that the sequence \{\|{ \mathbf{x}}^{v(k)}- \mathbf{x}^*\|\}_{k = 1}^\infty is bounded. Now, let us observe that
which means that the sequence \{\Gamma_{v(k)+1}\}_{k = 1}^\infty is bounded above. Finally, by using the inequality ( 3.5 ), we obtain that \{\Gamma_{k}\}_{k = 1}^\infty is bounded and, subsequently, \{ \mathbf{x}^k\}_{k = 1}^{\infty} is also bounded. □
A simple consequence of Lemma 3.3 is the boundedness of related sequences.
Lemma 3.4. The sequences \{{\nabla f(\mathbf{x}^{k})}\}_{k = 1}^{\infty} , \{ \mathbf{d}^k\}_{k = 1}^{\infty} , and \{ \mathbf{y}^k\}_{k = 1}^{\infty} are bounded.
Proof. Let k\in\mathbb{N} and \mathbf{x}\in \bigcap\limits_{i = 1}^{m}{\rm{Fix}}\;{T_{i}} be given. We first note from the L -smoothness of f that
where M_1: = \sup_{k\in \mathbb{N}}\| \mathbf{x}^k\| . This means that the boundedness of the sequence \{{\nabla f(\mathbf{x}^{k})}\}_{k = 1}^{\infty} is now obtained.
Next, by the construction of \mathbf{d}^k , we note for all k\geq 1 that
Thus, we have \| \mathbf{d}^k\|\leq \max\{M+1, \| \mathbf{d}^1\|\} for all k\in \mathbb{N} and the boundedness of \{ \mathbf{d}^k\}_{k = 1}^{\infty} is obtained. Finally, these two obtained results immediately yield the boundedness of the sequence \{ \mathbf{y}^k\}_{k = 1}^{\infty} . □
Lemma 3.5. For all k\geq{2} and \mathbf{x}\in \bigcap\limits_{i = 1}^{m}{\rm{Fix}}\;{T_{i}} , it holds that
Proof. Let k\geq{2} and \mathbf{x}\in \bigcap\limits_{i = 1}^{m}{\rm{Fix}}\;{T_{i}} be given. We note from the inequality (3.1) that
where the second inequality holds by the fact that {\| \mathbf{x}+ \mathbf{y}\|}^{2}\le{\| \mathbf{x}\|}^2+2\langle{ \mathbf{y}, \mathbf{x}+ \mathbf{y}}\rangle for all \mathbf{x}, \mathbf{y} \in\mathcal{H} , and the third inequality holds by Fact 1.3. □
The following proposition will be a key tool for obtaining the convergence result in Theorem 2.4. The idea and its proof can be consulted in [19, Lemma 2.6] and [20, Lemma 2.4].
Proposition 3.6. Let \{a_k\}_{k = 1}^\infty be a nonnegative real sequence, \{\delta_k\}_{k = 1}^\infty be a real sequence, and \{\alpha_k\}_{k = 1}^\infty be a real sequence in [0, 1] such that \sum\limits_{k = 1}^\infty\alpha_{k} = \infty . Suppose that
If \limsup\limits_{j\to\infty}\delta_{k_j}\leq{0} for every subsequence \{a_{k_j}\}_{j = 1}^\infty of \{a_k\}_{k = 1}^\infty satisfying
then \lim\limits_{k\to\infty}a_{k} = 0 .
Now, we are in a position to prove Theorem 2.4.
Proof. Let \mathbf{x}^* be the unique solution to the problem 1.1. For simplicity, we denote a_{k}: = {\| \mathbf{x}^k- \mathbf{x}^*\|}^2 for all k\in\mathbb{N} . Now, by using the facts obtained in Lemmata 3.3 and 3.4 and the fact that \lim\limits_{k\to\infty}\beta_{k} = 0 , we obtain
which implies that \lim_{k\to\infty}\varepsilon_{k} = 0 .
Next, we let a subsequence \{a_{k_j}\}_{j = 1}^\infty of the seuqence \{a_k\}_{k = 1}^\infty satisfying
or, equivalently,
By utilizing the inequality obtained in Lemma 3.1 and the fact \lim\limits_{k\to\infty}\varepsilon_{k} = 0 , we obtain
This implies that
Since \varepsilon^2\le\lambda_{k}(2-\lambda_{k}) , we obtain that
On the other hand, since \{ \mathbf{y}^k\}_{k = 1}^\infty is a bounded sequence, so is the sequence \{\langle{ \mathbf{y}^{k_j}}- \mathbf{x}^*, {\nabla f(\mathbf{x}^*)}\rangle\}_{j = 1}^\infty . Now, let \{ \mathbf{y}^{k_{j_\ell}}\}_{\ell = 1}^\infty be a sequence of \{ \mathbf{y}^{k_j}\}_{j = 1}^\infty such that
Due to the boundedness of the sequence \{ \mathbf{y}^{k_j}\}_{j = 1}^\infty , there exists a weakly cluster point z\in\mathcal{H} and a subsequence \{ \mathbf{y}^{k_{j_\ell}}\}_{\ell = 1}^\infty of \{ \mathbf{y}^{k_j}\}_{j = 1}^\infty such that \mathbf{y}^{k_{j_\ell}}\rightharpoonup{ \mathbf{z}\in\mathcal{H}} . According to the obtained fact in (3.7), we note that
Since T_1 satisfies the demi-closedness principle, we obtain that \mathbf{z}\in{\rm{Fix}}\;{T_1} . Furthermore, we note that the facts \mathbf{y}^{k_{j_\ell}}\rightharpoonup{ \mathbf{z}} and
yield that T_{1}(\mathbf{y}^{k_{j_\ell}})\rightharpoonup T_1(\mathbf{z}) = { \mathbf{z}} . Furthermore, we observe that
By invoking the assumption that T_{2} satisfies the demi-closedness principle, we also obtain that \mathbf{z}\in{\rm{Fix}}\;{T_{2}} . By continuing the same argument used in the above proving lines, we obtain that \mathbf{z}\in{\rm{Fix}}\;{T_{i}} for all i = 1, 2, \ldots, m , then \mathbf{z}\in \bigcap\limits_{i = 1}^{m}{\rm{Fix}}\;{T_i} . Since \mathbf{x}^* is the unique solution to the problem (1.1), we note from the optimality condition in Fact 1.2 that
Now, the assumption that \lim\limits_{k\to\infty}\varphi_{k} = 0 , the boundedness of the sequences \{ \mathbf{y}^k\}_{k = 1}^\infty and \{ \mathbf{d}^k\}_{k = 1}^\infty , and the relation (3.8) yield that
Hence, by applying Propostion 3.6, we conclude that \lim\limits_{k\to\infty}a_k = 0 . The proof is completed. □
4.
Some illustrated consequences
In this section, we will provide some simple consequences of the main theorem. We start with the minimization problem over the system of linear inequalities. This constraint is nothing else but the intersection of linear half-spaces. We subsequently show that the proposed algorithm and convergence result are also applicable in the well-known support vector machine learning.
4.1. Minimization problem over system of linear inequalities
In this subsection, we assume that the function f: \mathcal{H} \to \mathbb{R} is \eta -strongly convex and L -smooth as given in the assumption (A1). Now, let \mathbf{a}_i\in \mathcal{H} with \mathbf{a}_i\neq \mathbf{0} and b_i\in \mathbb{R} , i = 1, \ldots, m . We consider the minimization of a strongly convex smooth function over the intersection of nonempty closed and convex sets, which is in the following form:
We may assume the consistency of the system of linear inequalities and we also denote the solution point to the problem (4.1) by \mathbf{x}^* . Now, for each i = 1, \ldots, m, we let H_i: = \{ \mathbf{x}\in \mathcal{H}: < \mathbf{a}_i, \mathbf{x} > \leq b_i\} be the half-space corresponding to \mathbf{a}_i and b_i . Furthermore, we set T_i: = P_{H_i} , the metric projection onto the half-space H_i , for all i = 1, \ldots, m . It is worth noting that the metric projection onto a half-space has a closed-form expression, that is,
see [17, Subsection 4.1.3] for further details. Note that H_i is a closed and convex set and the fixed-point set. {\rm{Fix}}\; T_i = H_i for all i = 1, \ldots, m . To derive an iterative method for solving the problem (4.1), we recall the notations T: = T_m T_{m-1}\cdots T_1 , S_0: = Id , and S_i: = T_i T_{i-1}\cdots T_1 , for all i = 1, 2, \ldots, m . Now, for every \mathbf{x}\in \mathcal{H} , by setting \mathbf{u}^i : = S_i(\mathbf{x}) , we have \mathbf{u}^0 = \mathbf{x} and \mathbf{u}^i = T_i(\mathbf{u}^{i-1}) , for all i = 1, 2, \ldots, m. In this case, the stepsize function \sigma: \mathcal{H}\to(0, \infty) can be written as the following:
see [13, Section 4.2] for further details.
In order to solve the problem (4.1), we consider a particular situation of MESCoM-CGD as the following algorithm.
We obtain an immediate consequence of Theorem 2.4 in the following corollary.
Corollary 4.1. Let the sequence \{ \mathbf{x}^k\}_{k = 1}^\infty be generated by Algorithm 2 and assume that assumption (A1) and hypotheses (H1)–(H3) hold, then the sequence \{ \mathbf{x}^k\}_{k = 1}^\infty converges strongly to the unique solution \mathbf{x}^{*} of the problem (4.1).
To examine the behavior of Algorithm 2 and the convergence given in Corollary 4.1, we consider the solving of the minimum-norm problem to the system of homogeneous linear inequalities. We assume that the whole space \mathcal{H} is finite dimensional so that \mathcal{H} = \mathbb{R}^n . Given a matrix \mathbf{A} = [ \mathbf{a}_1|\cdots| \mathbf{a}_m]^T of predictors \mathbf{a}_i = (\mathbf{a}_{1i}, \ldots, \mathbf{a}_{ni})\in \mathbb{R}^n , for all i = 1, \ldots, m , \mathbf{b} = (b_1, \ldots, b_m)\in\mathbb{R}^m is a vector. The minimum-norm problem is to find the vector \mathbf{x}\in \mathbb{R}^n that solves the problem
or, equivalently, in the explicit form
Now, by putting the constrained sets H_i: = \{ \mathbf{x}\in\mathbb{R}^n : \langle{ \mathbf{a}_i, \mathbf{x}}\rangle\leq b_i\} , i = 1, \ldots, m as half spaces and T_i: = P_{H_i} , i = 1, \ldots, m, the metric projections onto H_i with {\rm{Fix}}\; T_i = H_i satisfy the demi-closed principle. Furthermore, the function f(\mathbf{x}): = \frac{1}{2}\| \mathbf{x}\|^2 is 1-strongly convex with 1 -smooth. In this situation, the problem (4.2) is nothing else but a special case of the problem (4.1), which yields that Algorithm 2 and the convergence given in Corollary 4.1 are applicable.
To perform a numerical illustration in solving the problem (4.1), we generate the matrix \mathbf{A}\in \mathbb{R}^{m\times n} , where m = 200 and n = 100 by uniformly distributed random generating between (-5, 5) , and set the vector \mathbf{b} = \mathbf{0} . According to Remark 2.3 (ⅰ)-(ⅲ), we examine the influence of parameters \beta_0\in [0.8, 1] , \varphi_0\in[0.1, 1] , and \lambda_0\in [0.1, 1.9] . We fix the corresponding parameter \mu = 1.9 . We terminated the experiment with the error of norm, that is, \| \mathbf{x}^{k+1}- \mathbf{x}^k\| < \epsilon . We manually choose the choice of parameters \beta_0 = 1 , \varphi_0 = 1 , and \lambda_0 = 1 with the smallest number of iterations when the error of tolerance \epsilon = 10^{-4} is met.
Next, we show behavior of Algorithm 2 when solving the problem (4.2) for various error of tolerance \epsilon . We set number of variables to be n = 100 and consider several number of inequalities, that is, m = 100,200,300,400 , and 500 . According to the above experiments, we set the parameters \beta_0 = 1 , \varphi_0 = 1 , and \lambda_0 = 1 . We plot the number of iterations and computational time in seconds in Figure 1.
It can be noticed from Figure 1 that a smaller number m needed a larger number of iterations for all errors of tolerance. Moreover, for the numbers m = 100,200 , and 300 , it can be seen that a smaller number m needed a larger computational time.
4.2. Support vector machine learning problem
In this subsection, we consider the constructing of a classifier in binary classification problem starting by a given training datasets of two classes. More precisely, the i th training data \mathbf{a}_i\in \mathbb{R}^n and the i th label b_i\in\{-1, +1\} of the i th training data, for all i = 1, \ldots, m . The support vector machine learning is to train a weight \mathbf{u}\in \mathbb{R}^n so that a (linear) classifier c(\mathbf{a}): = < \mathbf{a}, \mathbf{u} > can give a corrected class of every new tested data \mathbf{a}\in \mathbb{R}^n . In this situation, \mathbf{a} will be identified to be in the class -1 if c(\mathbf{a}) < 0 , and to the class +1 , otherwise. Mathematically, the support vector machine learning can be form as the following minimization problem:
where \xi_i \geq 0 is nonnegative slack variable corresponding to the i th training data for all i = 1, \ldots, m . By introducing a new variable \mathbf{x}: = (\mathbf{u}, \xi_1, \ldots, \xi_m)\in \mathbb{R}^{n+m} and using the idea given in, for instance [15,21], the problem (4.3) can be written as
where the matrix \mathbf{A} \in \mathbb{R}^{2m\times (n+m)} is given by
and the vector \mathbf{b} \in \mathbb{R}^{2m} is given by
This problem (4.4) is nothing else but a particular case of the problem (4.2) and, subsequently, Algorithm 2 and the convergence given in Corollary 4.1 are also applicable.
In the first experiment, we aim to classify the 28\times 28 images on gray scale pixels of the handwritten digits from the MNIST dataset, which was provided as https://cs.nyu.edu/roweis/data.html. We used the dataset of 5000 images for the handwritten digit 9 and the dataset of 5000 images for the handwritten digits 0-8 . The images are labeled by the classes +1 and -1 , respectively. We perform the 10 -fold cross-validation on the given datasets. Actually, in each class, we put a fold of 1000 images to be the testing data and the remaining 9 folds consisting of 9000 images to be the training data. We perform the cross-validation process repeatedly for 10 times so that each fold is set to be the testing data.
Recalling that the class of digit 9 and the class of digits 0-8 are labeled by +1 and -1 , which are positive and negative, respectively, we denote the numbers of a tested data. It is classified as positive by true positive (TP); if classified as negative, then it is denoted by false negative (FN). If a tested data is labeled as negative and is classified as negative, then it is denoted as true negative (TN); if it is classified as positive, it is denoted as false positive (FP). These numbers are summarized as the following details:
To measure the performance of each obtained classifier performed by each cross-validation, we consider classification performance metrics, namely, accuracy, precision, recall, specificity, and F-measure. These performance metrics are computed by the following details:
In the experiment, we terminate the experiment when the number of iterations is k = 50 and, subsequently, average each performance metric after the cross-validation process repeatedly for 10 times. The classification performance metrics of each parameter are presented in Table 1.
The results given in Table 1 shows that the parameters \beta_0 and \varphi_0 slightly effected the classification performances. The highest accuracy and F-measure values were obtained for the case \beta_0 = 0.5 . Actually, we can observe that these five metrics as well as the computation running times were not much different. This may yield the stability of the proposed method in the sense that the corresponding parameters do not hugely effect the convergence of the proposed method.
5.
Conclusions
We presented the modified version of the extrapolated sequential constraint method proposed in [11] for solving the minimization problem over the intersection of the fixed-point sets of cutter operators. We not only presented a simple version of the strong convergence of the proposed method, but also omitted the boundedness assumption used in [11]. We examined the proposed method to solve the binary classification by using support vector machines.
Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
Acknowledgments
The authors are grateful to three anonymous reviewers for their comments and suggestions that improved the paper's quality and presentation. This work has received funding support from the NSRF via the Program Management Unit for Human Resources & Institutional Development, Research and Innovation [grant number B05F650018]. D. Rakjarungkiat gratefully acknowledges the research capability enhancement program through graduate student scholarship, Faculty of Science, Khon Kaen University.
Conflict of interest
The authors declare that there is no conflict of interest regarding the publication of this article.