![]() |
This paper is concerned with almost sure exponential synchronization of multilayer complex networks with Markovian switching via aperiodically intermittent discrete observation noise. First, Markovian switching and multilayer interaction factors are taken into account simultaneously, which make our system more general compared with the existing literature. Meanwhile, the network architecture may be undirected or directed, and consequently, the adjacency matrix is symmetrical and asymmetrical. Second, the control strategy is based on aperiodically intermittent discrete observation noise, where the average control rate is integrated to depict the distributions of work/rest intervals of the control strategy from an overall perspective. Third, different from the work about pth moment exponential synchronization of network systems, by utilizing M-matrix theory and various stochastic analysis techniques including the Itô formula, the Gronwall inequality, and the Borel-Cantelli lemma, some criteria on almost sure exponential synchronization of multilayer complex networks with Markovian switching have been constructed and the upper bound of the duration time has been also estimated. Finally, several numerical simulations are exhibited to validate the effectiveness and feasibility of our analytical findings.
Citation: Li Liu, Yinfang Song, Hong Yu, Gang Zhang. Almost sure exponential synchronization of multilayer complex networks with Markovian switching via aperiodically intermittent discrete observa- tion noise[J]. AIMS Mathematics, 2024, 9(10): 28828-28849. doi: 10.3934/math.20241399
[1] | Aissa Boukarou, Kaddour Guerbati, Khaled Zennir, Mohammad Alnegga . Gevrey regularity for the generalized Kadomtsev-Petviashvili I (gKP-I) equation. AIMS Mathematics, 2021, 6(9): 10037-10054. doi: 10.3934/math.2021583 |
[2] | Xiaochun Sun, Yulian Wu, Gaoting Xu . Global well-posedness for the 3D rotating Boussinesq equations in variable exponent Fourier-Besov spaces. AIMS Mathematics, 2023, 8(11): 27065-27079. doi: 10.3934/math.20231385 |
[3] | Muhammad Imran Liaqat, Fahim Ud Din, Wedad Albalawi, Kottakkaran Sooppy Nisar, Abdel-Haleem Abdel-Aty . Analysis of stochastic delay differential equations in the framework of conformable fractional derivatives. AIMS Mathematics, 2024, 9(5): 11194-11211. doi: 10.3934/math.2024549 |
[4] | William O. Bray, Ellen Hunter . Wave equations & energy. AIMS Mathematics, 2019, 4(3): 463-481. doi: 10.3934/math.2019.3.463 |
[5] | Phuong Nguyen Duc, Ahmet Ocak Akdemir, Van Tien Nguyen, Anh Tuan Nguyen . Remarks on parabolic equation with the conformable variable derivative in Hilbert scales. AIMS Mathematics, 2022, 7(11): 20020-20042. doi: 10.3934/math.20221095 |
[6] | Kedong Wang, Xianguo Geng, Mingming Chen, Ruomeng Li . Long-time asymptotics for the generalized Sasa-Satsuma equation. AIMS Mathematics, 2020, 5(6): 7413-7437. doi: 10.3934/math.2020475 |
[7] | Said Mesloub, Faten Aldosari . Well posedness for a singular two dimensional fractional initial boundary value problem with Bessel operator involving boundary integral conditions. AIMS Mathematics, 2021, 6(9): 9786-9812. doi: 10.3934/math.2021569 |
[8] | Zhongdi Cen, Jian Huang, Aimin Xu . A posteriori mesh method for a system of singularly perturbed initial value problems. AIMS Mathematics, 2022, 7(9): 16719-16732. doi: 10.3934/math.2022917 |
[9] | Murat A. Sultanov, Vladimir E. Misilov, Makhmud A. Sadybekov . Numerical method for solving the subdiffusion differential equation with nonlocal boundary conditions. AIMS Mathematics, 2024, 9(12): 36385-36404. doi: 10.3934/math.20241726 |
[10] | Dumitru Baleanu, Babak Shiri . Generalized fractional differential equations for past dynamic. AIMS Mathematics, 2022, 7(8): 14394-14418. doi: 10.3934/math.2022793 |
This paper is concerned with almost sure exponential synchronization of multilayer complex networks with Markovian switching via aperiodically intermittent discrete observation noise. First, Markovian switching and multilayer interaction factors are taken into account simultaneously, which make our system more general compared with the existing literature. Meanwhile, the network architecture may be undirected or directed, and consequently, the adjacency matrix is symmetrical and asymmetrical. Second, the control strategy is based on aperiodically intermittent discrete observation noise, where the average control rate is integrated to depict the distributions of work/rest intervals of the control strategy from an overall perspective. Third, different from the work about pth moment exponential synchronization of network systems, by utilizing M-matrix theory and various stochastic analysis techniques including the Itô formula, the Gronwall inequality, and the Borel-Cantelli lemma, some criteria on almost sure exponential synchronization of multilayer complex networks with Markovian switching have been constructed and the upper bound of the duration time has been also estimated. Finally, several numerical simulations are exhibited to validate the effectiveness and feasibility of our analytical findings.
Sparse optimization is a core problem of compressed sensing [1,2,3], signal and image processing [4,5,6,7], and high-dimensional statistical learning [4,8], etc. Sparsity is usually characterized by the ℓ0 norm, which is the cardinality of the support set of vector x∈Rn, denoted by ‖x‖0=|supp(x)|=|{i∈{1,2,⋯,n}:xi≠0}|. The penalized formulation of sparse optimization can be expressed as the following cardinality regularized optimization:
min | (1.1) |
where f: \mathbb{R}^n\to \mathbb{R} is a loss function depending on the application, \lambda > 0 is the penalty parameter.
Compared with sparse optimization, composite sparse optimization problems enforce certain structural constraints instead of pure sparsity on the coefficients, which arise from many important applications in various fields, such as structural health monitoring [10], fault diagnosis [11], motion planning [12] and impact force identification[13], etc. The most important method is to promote the sparsity of variables through linear transformation [14]. By imposing a regularization matrix W = (W_1^{\top}, \ldots, W_p^{\top})^{\top}\in \mathbb{R}^{p\times n} on vector \mathbf{x} , composite sparse optimization is nicely encapsulated as the following optimization formulation:
\begin{equation} \begin{aligned} \min\limits_{\mathbf{x}\in\mathbb{R}^n}\; f(\mathbf{x})+\lambda\|W\mathbf{x}\|_0. \end{aligned} \end{equation} | (1.2) |
A typical choice of function f in problem (1.2) is the \ell_2 loss function given by f(\mathbf{x}) = \frac{1}{2}\|A\mathbf{x}-b\|_2^2 , where A = (A_1^{\top}, \ldots, A_m^{\top})^{\top}\in\mathbb{R}^{m\times n} and b = (b_1, \ldots, b_m)^{\top}\in \mathbb{R}^m , and the \ell_1 relaxation of the \ell_0 norm given by \|W\mathbf{x}\|_1 , which was first defined and summarized as generalized LASSO with the general formulation of W [15]. Unlike traditional LASSO, solving generalized LASSO efficiently on high- dimensional data is very challenging. A few attempts have been made to improve the efficiency of generalized LASSO, but this requires a specific form of the W to work well [14,15,16], such as the fused LASSO problem [17], the TV regularizer [18] and trending filtering [19].
However, many loss functions of the composite sparse optimization problems cannot be expressed in the form of differentiable functions. The results in [20] showed that the least squares loss function can solve a class of linear regression problems but is not suitable for all types of data. We can choose the outlier that has a strong interference loss function, such as the {\mathcal{\ell}_1} function, quantile regression function, or more general Huber class function. On the other hand, J. Fan et al. [20] pointed out that using the {\mathcal{\ell}_1} relaxation often results in a biased estimator, various continuous nonconvex relaxation functions for {\mathcal{\ell}_0} norm were proposed, such as the smoothly clipped absolute deviation (SCAD) function [20], the hard thresholding function [21], capped- {\mathcal{\ell}_1} function [22,23,24,25,26], the transformed {\mathcal{\ell}_1} function [27], etc. Here, we are interested in the capped- {\mathcal{\ell}_1} function as the relaxation function of the {\mathcal{\ell}_0} norm, which is a simple relaxation function that satisfies specific properties. Z. Shen et al. [40] applied locally Lipschitz continuous scaled folded concave functions to the approximate {\mathcal{\ell}_0} pseudo-norm. A generic nonsmooth but convex framework was established to gradually approximate the scaled folded concave functions. Numerical experimental results showed the proposed framework and algorithms admitted the exact sparsity-induced capability of the {\mathcal{\ell}_0} pseudo-norm. Q. Chen et al.[41] first explored using a class of locally Lipschitz scale folded concave functions to approach the {\mathcal{\ell}_0} . Then, a convex half-absolute method was proposed to precisely approximate these nonconvex nonsmooth functions. A double iterative algorithm was considered to solve the convex-relaxed composite optimization problems. Both [40] and [41] established a generic nonsmooth convex framework that gradually approximates these scale-folded concave functions based on the Legendre-Fenchel transformation to avoid directly solving a nonconvex optimization problem. However, we use a smoothing function for approximation to achieve the goal of solving the nonconvex optimization problem. The advantages of capped- {\mathcal{\ell}_1} function have been explored in various fields. For example, the authors in [28] put forward a capped {\mathcal{\ell}_1} -norm Sparse Representation method (CSR) for graph clustering. The proposed model learned the optimal graph for clustering by integrating sparse representation and capped {\mathcal{\ell}_1} -norm loss function. In order to utilize the advantages of the twin extreme learning machine and FDA, Z. Xue et al. [29] first put forward a novel classifier named Fisher-regularized twin extreme learning machine (FTELM). Also considering the instability of the {\mathcal{\ell}_2} -norm for the outliers, authors introduced the capped {\mathcal{\ell}_1} -norm into the FTELM model and proposed a more robust capped {\mathcal{\ell}_1} -norm FTELM (C {\mathcal{\ell}_1} -FTELM) model. The capped {\mathcal{\ell}_1} function was also discussed in the context of sparse group {\mathcal{\ell}_0} regularized algorithms by [30]. It's worth noting that reference [9] gave an exact continuous relaxation problem with capped- {\mathcal{\ell}_1} penalty for nonsmooth convex loss function with cardinality penalty in the sense that both problems have the same optimal solution set. Moreover, a smoothing proximal gradient algorithm for finding a lifted stationary point of the continuous relaxation model was proposed. Regarding the solution of relaxation problems, T. Zhang [42] presented a multi-stage convex relaxation scheme for solving problems with non-convex objective functions. However, only parameter estimation performance was analyzed in [42]. Unfortunately, the result in [42] does not directly imply that multi-stage convex relaxation achieves unbiased recovery of the support set. H. Zhou et al. [43] proposed a new unified algorithm based on the local linear approximation (LLA) for maximizing the penalized likelihood for a broad class of concave penalty functions. It did not eliminate the bias issue. Here, we extend the results in [9] to composite sparse optimization and give a smoothing gradient descent algorithm for the continuous relaxation problem. The new algorithm exploits the piecewise linearity of the capped- {\mathcal{\ell}_1} penalty term in the relaxation problem. In view of the composite sparsity, if the subproblem in the algorithm does not have a closed solution, then our relaxation problem model analogizes the {\mathcal{\ell}_1} penalty model for solving LASSO problems, using the smoothing gradient method to solve the subproblem. We prove that if the sequence generated by the algorithm has an accumulation point, then it is a lifted stationary point of relaxation problem.
In this paper, we consider the following composite sparse regression problem with cardinality penalty,
\begin{equation} \begin{aligned} \min\limits_{\mathbf{x}\in\Omega} \; \; \; \mathcal{W}_{\mathcal{\ell}_0}(\mathbf{x}): = f(\mathbf{x})+\lambda\|W\mathbf{x}\|_0, \end{aligned} \end{equation} | (1.3) |
where f: \mathbb{R}^n\to \mathbb{R} is a convex (not necessarily smooth) and bounded from below function, \lambda is a positive parameter, and \Omega = \{\mathbf{x}\in\mathbb{R}^n:l\leq W\mathbf{x}\leq u\} . For example, the \ell_1 loss function given by
\begin{equation} \begin{aligned} f(\mathbf{x}) = \frac{1}{m}\|A\mathbf{x}-b\|_1, \end{aligned} \end{equation} | (1.4) |
or the censored regression problem with
\begin{equation} \begin{aligned} f(\mathbf{x}) = \frac{1}{m}\sum\limits_{i = 1}^m|\max\{A_i\mathbf{x},0\}-b_i|. \end{aligned} \end{equation} | (1.5) |
For a given parameter \nu > 0 , the continuous relaxation of the \ell_0 penalty with the capped- \ell_1 function \phi is given by
\begin{equation} \begin{aligned} \phi(t) = \min\Big\{1,\frac{|t|}{\nu}\Big\}. \end{aligned} \end{equation} | (1.6) |
We consider the following continuous optimization problem to solve (1.3):
\begin{equation} \begin{aligned} \min\limits_{\mathbf{x}\in\Omega} \; \; \; \mathcal{W}(\mathbf{x}): = f(\mathbf{x})+\lambda\Phi(W\mathbf{x}), \end{aligned} \end{equation} | (1.7) |
where \Phi(W\mathbf{x}) = \sum_{i = 1}^p\phi(W_i\mathbf{x}) .
Composite sparse optimization has attracted much attention recently. In [15], a dual path algorithm was proposed for the generalized LASSO problem with any formulation of W . If the composite optimization problem is convex and the W is the general linear map, one feasible approach is to apply the alternating direction method of multipliers (ADMM)[31]. In [31], the author proposed a dual method for the variational problem in the form of \inf\{f(Av)+g(v)\} . Z. J. Bai [32] aimed to provide a coordinate gradient descent method with stepsize chosen by an Armijo-type rule to solve the problem \min_{\mathbf{x}} f(\mathbf{x})+c\|L\mathbf{x}\|_1 and \min_{\mathbf{x}} \|A\mathbf{x}-b\|_2^2+c\|L\mathbf{x}\|_1 efficiently, especially when the problems dimension is large.
In this paper, we use the exact continuous relaxation problem with capped- {\mathcal{\ell}_1} function to solve optimization problem (1.3) and present a smoothing gradient descent (SGD) algorithm. Since W is a general linear mapping and \Phi(W\mathbf{x}) is an inseparable function, which makes the proximal gradient algorithm unable to be explicitly applied, we approximately solve the subproblem in the SGD algorithm. We prove that if there is an accumulation point, then the accumulation point is a lifted stationary point of (1.7).
Notation. We denote \mathbb{N} = \{0, 1, \ldots\} . For \mathbf{x}\in \mathbb{R}^n and \delta > 0 , let \|\mathbf{x}\|: = \|\mathbf{x}\|_2 and \mathbb{B}_\delta(\mathbf{x}) means the open ball centered at \mathbf{x} with radius \delta . For a nonempty, closed, and convex set \Omega\subseteq \mathbb{R}^n , N_\Omega(\mathbf{x}) means the normal cone to \Omega at \mathbf{x}\in\Omega . \sigma_{\mathrm{min}}(W) is the minimum singular value of W . We denote \|\mathbf{x}\|_\infty = \max\{|\mathbf{x}_1|, |\mathbf{x}_2|, \ldots, |\mathbf{x}_n|\} .
Before starting this section, we first make the following two assumptions:
Assumption 2.1. Function f is Lipschitz continuous on \Omega with Lipschitz constant L_f > 0 and matrix W has full column rank.
Assumption 2.2. Parameter \nu in (1.6) satisfies 0 < \nu < \overline{\nu}: = \frac{\lambda\sigma_{\min}(W)}{L_f} .
We suppose Assumptions 2.1 and 2.2 hold throughout the paper and assume that L_f is large enough such that L_f\geq\frac{\lambda\sigma_{\min}(W)}{\Gamma} , where
\Gamma: = \min\{|l_i|,u_i:l_i\neq0,u_i\neq0,i = 1,\ldots,p\}. |
When f is defined by the {\mathcal{\ell}_1} loss function (1.4) or the censored regression loss function (1.5), L_f can be taken as L_f = \|A\|_\infty .
We first give the definition of lifted stationary points of (1.7) as that in [33]. Since function \phi in (1.6) can be rephrased as
\phi(t) = \frac{1}{\nu}|t|-\max\{\theta_1(t),\theta_2(t),\theta_3(t)\} |
with \theta_1(t) = 0 , \theta_2(t) = \frac{t}{\nu}-1 and \theta_3(t) = -\frac{t}{\nu}-1 for t\in\mathbb{R} , we denote
\begin{equation} \mathbb{D}(t): = \{i\in\{1,2,3\}:\theta_i(t) = \max\{\theta_1(t),\theta_2(t),\theta_3(t)\}\} \end{equation} | (2.1) |
and
\begin{equation} \mathbb{D}^p(W\mathbf{x}): = \Pi_{i = 1}^{p}\mathbb{D}(W_i{\mathbf{x}}). \end{equation} | (2.2) |
Definition 2.1. Point \mathbf{x}\in\Omega is called a lifted stationary point of (1.7) if there exists d = (d_1, \ldots, d_p)^{\top}\in\mathbb{D}^p(W\mathbf{x}) such that
\begin{equation} \lambda\sum\limits_{i = 1}^p(\theta_{d_i}'(W_i\mathbf{x}))^{\top}\in\partial f(\mathbf{x})+\frac{\lambda}{\nu} \sum\limits_{i = 1}^p W_i^{\top}\vartheta^i(\mathbf{x})+N_\Omega(\mathbf{x}), \end{equation} | (2.3) |
where
\begin{equation} \vartheta^i(\mathbf{x}) \begin{cases} = 1 & \mathrm{if} \; W_i\mathbf{x} > 0,\\ \in[-1,1] &\mathrm{if}\; |W_i\mathbf{x}| = 0,\\ = -1 & \mathrm{if}\; W_i\mathbf{x} < 0. \end{cases} \end{equation} | (2.4) |
Under the definition of the range of \nu in Assumption 2.2, we first prove that the element in \mathbb{D}^p(W\mathbf{x}) for a lifted stationary point satisfying (2.3) is unique.
Proposition 2.2. If \bar{\mathbf{x}} is a lifted stationary point of (1.7), then the vector d^{W\bar{\mathbf{x}}} = (d_1^{W\bar{\mathbf{x}}}, \ldots, d_p^{W\bar{\mathbf{x}}})^{\top}\in \mathbb{D}^p(W\bar{\mathbf{x}}) satisfying (2.3) is unique. In particular, for i = 1, \ldots, p,
\begin{equation} d_i^{W\bar{\mathbf{x}}} = \begin{cases} 1 & \mathrm{if} \; |W_i\bar{\mathbf{x}}| < {\nu},\\ 2 & \mathrm{if} \; W_i\bar{\mathbf{x}}\geq{\nu },\\ 3 & \mathrm{if} \; W_i\bar{\mathbf{x}}\leq{-\nu }. \end{cases} \end{equation} | (2.5) |
Proof. If |W_i\bar{\mathbf{x}}|\neq\nu , the statement is clearly valid. Hence, we only need to consider the case |W_i\bar{\mathbf{x}}| = \nu . When W_i\bar{\mathbf{x}} = \nu , since \mathbb{D}(W_i\bar{\mathbf{x}}) = \{1, 2\}, arguing by contradiction, we assume (2.3) holds with d_i^{W{\bar{\mathbf{x}}}} = 1 , so \vartheta^i(\bar{\mathbf{x}}) = 1 . By \nu < \bar{\nu}, we have W_i\bar{\mathbf{x}}\in(l_i, u_i) , and by (2.3), there exists \xi(\bar{\mathbf{x}})\in\partial f(\bar{\mathbf{x}}) such that
\begin{equation} \mathbf{0} = \xi(\bar{\mathbf{x}})+\frac{\lambda}{\nu}\sum\limits_{{i:|W_i\bar{\mathbf{x}}|\leq\nu}} W_{i}^{\top}\vartheta^i(\bar{\mathbf{x}}), \end{equation} | (2.6) |
where \vartheta^i(\bar{\mathbf{x}}) is defined as (2.4).
It is easy to observe that the following relation holds
\begin{equation} \begin{aligned} \left\|\sum\limits_{{i:|W_i\bar{\mathbf{x}}|\leq\nu}} W_{i}^{\top}\vartheta^i(\bar{\mathbf{x}})\right\|_2\geq\sigma_{\min}(W). \end{aligned} \end{equation} | (2.7) |
In fact, from the definition of the minimum singular value of W ,
\begin{equation*} \label{abcd} \begin{aligned} \sigma_{\min}(W) = \min\left\{\frac{\|W\mathbf{x}\|_2}{\|\mathbf{x}\|_2}:\mathbf{x}\neq\mathbf{0}\right\} = \min\left\{\|W\mathbf{x}\|_2:\|\mathbf{x}\|_2 = 1\right\}, \end{aligned} \end{equation*} |
we have
\begin{equation*} \label{abed} \begin{aligned} \sigma_{\min}(W)& = \min\left\{\frac{\|W\mathbf{x}\|_2}{\|\mathbf{x}\|_2}:\mathbf{x}\neq\mathbf{0}\right\}\\ &\leq\min\left\{\frac{\|W\mathbf{x}\|_2}{\|\mathbf{x}\|_2}:\|\mathbf{x}\|_2\geq1,\|\mathbf{x}\|_\infty\leq1\right\}\\ &\leq\min\left\{\|W\mathbf{x}\|_2:\|\mathbf{x}\|_2\geq1,\|\mathbf{x}\|_\infty\leq1\right\}\\ &\leq\min\{\|W\mathbf{x}\|_2:\|\mathbf{x}\|_2 = 1\}\\ & = \sigma_{\min}(W). \end{aligned} \end{equation*} |
Then, we see that
\begin{equation*} \label{abod} \begin{aligned} \sigma_{\min}(W) = \min\{\|W\mathbf{x}\|_2:\|\mathbf{x}\|_2\geq1,\|\mathbf{x}\|_\infty\leq1\}. \end{aligned} \end{equation*} |
From the definition of \vartheta^i(\bar{\mathbf{x}}) (2.4), this yields that
\begin{equation*} \begin{aligned} \left\|\sum\limits_{{i:|W_i\bar{\mathbf{x}}|\leq\nu}} W_{i}^{\top}\vartheta^i(\bar{\mathbf{x}})\right\|_2\geq\sigma_{\min}(W_I)\geq\sigma_{\min}(W), \end{aligned} \end{equation*} |
where W_I is the submatrix consisting of the rows in W indexed by I: = \{i:|W_i\bar{\mathbf{x}}|\leq\nu\} [34].
Combining (2.6) and (2.7), we have
\frac{\lambda\sigma_{\min}(W)}{\nu}\leq\frac{\lambda}{\nu}\left\|\sum\limits_{{i:|W_i\bar{\mathbf{x}}|\leq\nu}} W_{i}^{\top}\vartheta^i(\bar{\mathbf{x}})\right\| = \|\xi(\bar{\mathbf{x}})\|\leq L_f. |
This leads to a contradiction to \nu < \frac{\lambda\sigma_{\min}(W)}{L_f} . Then, (2.5) holds for W_i\bar{\mathbf{x}} = \nu . Similar analysis can be given for the case that W_i\bar{\mathbf{x}} = -\nu , which completes the proof.
For a given d = (d_1, \ldots, d_p)^{\top}\in\mathbb{D}^p: = \{d\in\mathbb{R}^p:d_i\in\{1, 2, 3\}, i = 1, \cdots, p\} , we define
\begin{equation} \Phi^d(W\mathbf{x}): = \sum\limits_{i = 1}^p\frac{|W_i\mathbf{x}|}{\nu}-\sum\limits_{i = 1}^p\theta_{d_i}(W_i\mathbf{x}), \end{equation} | (2.8) |
which is convex with respect to \mathbf{x} . It can be verified that
\Phi(W\mathbf{x}) = \min\limits_{d\in\mathbb{D}^p}\Phi^d(W\mathbf{x}),\; \forall \mathbf{x}\in \Omega. |
In particular, for a fixed \bar{\mathbf{x}}\in\Omega , \Phi(W\bar{\mathbf{x}}) = \Phi^{d^{W\bar{\mathbf{x}}}}(W\bar{\mathbf{x}}) and the following relation holds
\begin{equation} \bar{\mathbf{x}} \mathrm{\; is \; a\; lifted \; stationary \; point\; of\; } (1.7) \Leftrightarrow\bar{\mathbf{x}}\in \arg\min\limits_{\mathbf{x}\in\Omega} f(\mathbf{x})+\lambda\Phi^{d^{W\bar{\mathbf{x}}}}(W\mathbf{x}). \end{equation} | (2.9) |
Next lemma describes a lower bound property.
Lemma 2.3. If \bar{\mathbf{x}}\in\Omega is a lifted stationary point of (1.7), then it holds that
\begin{equation} W_i\bar{\mathbf{x}}\in(-\nu,\nu)\; \; \Rightarrow \; \; W_i\bar{\mathbf{x}} = 0,\; \; \; \; \forall i = 1,\ldots,p. \end{equation} | (2.10) |
Proof. Suppose that \bar{\mathbf{x}} is a lifted stationary point of (1.7). Now we assume that W_i\bar{\mathbf{x}}\in(-\nu, \nu)\setminus\{0\} for some i\in{1, \ldots, p} . So from (2.5) and Assumption 2.1, we have d_i^{W\bar{\mathbf{x}}} = 1 and W_i\bar{\mathbf{x}}\in(l_i, u_i) . By (2.3), there exists \xi(\bar{\mathbf{x}})\in\partial f(\bar{\mathbf{x}}) . We have
\mathbf{0} = \xi(\bar{\mathbf{x}})+\frac{\lambda}{\nu}\sum\limits_{{i:|W_i\bar{\mathbf{x}}| < \nu}} W_{i}^{\top}\vartheta^i(\bar{\mathbf{x}}). |
Through the analysis in the proof of Proposition 2.2, combining (2.6) and (2.7), we have
\frac{\lambda\sigma_{\min}(W)}{\nu}\leq \frac{\lambda}{\nu}\left\|\sum\limits_{{i:|W_i\bar{\mathbf{x}}| < \nu}} W_{i}^{\top}\vartheta^i(\bar{\mathbf{x}})\right\| = \|\xi(\bar{\mathbf{x}})\|\leq L_f, |
which leads to a contradiction to \nu < \frac{\lambda\sigma_{\min}(W)}{L_f} . Consequently, W_i\bar{\mathbf{x}}\in(-\nu, \nu) implies W_i\bar{\mathbf{x}} = 0 for i\in{1, \ldots, p} {and} the proof is completed.
This subsection discusses the relationship between the global minimizers and local minimizers of (1.3) and (1.7). First, Theorem 2.4 discusses the relationship between the local minimizers of (1.3) and (1.7). Second, Theorem 2.5 states that (1.3) and (1.7) have the same global minimizers. We use the lower bound property mentioned in Lemma 2.3 to prove Theorems 2.4 and 2.5.
Theorem 2.4. If \bar{\mathbf{x}} is a lifted stationary point of (1.7), then it is a local minimizer of (1.3) and the objective functions have the same value at \bar{\mathbf{x}} , i.e., f(\bar{\mathbf{x}})+\lambda\Phi(W\bar{\mathbf{x}}) = f(\bar{\mathbf{x}})+\lambda\|W\bar{\mathbf{x}}\|_0 .
Proof. Combining the lower bound property of W\bar{\mathbf{x}} in (2.10) and the definition of \Phi^{d^{W\bar{\mathbf{x}}}} defined in (2.8), for any \mathbf{x}\in\mathbb{R}^n , we have
\begin{equation*} \begin{aligned} \Phi^{d^{W\bar{\mathbf{x}}}}(W\mathbf{x}):& = \sum\limits_{i = 1}^p\frac{|W_i\mathbf{x}|}{\nu}-\sum\limits_{i = 1}^p\theta_{d_i^{W\bar{\mathbf{x}}}}(W_i\mathbf{x})\\ & = \sum\limits_{i:|W_i\bar{\mathbf{x}}|\geq\nu}1+\sum\limits_{i:|W_i\bar{\mathbf{x}}| < \nu}\frac{|W_i\mathbf{x}|}{\nu}\\ & = \|W\bar{\mathbf{x}}\|_0+\sum\limits_{i:W_i\bar{\mathbf{x}} = 0}\frac{|W_i\mathbf{x}|}{\nu}. \end{aligned} \end{equation*} |
Then
\begin{equation} \Phi^{d^{W\bar{\mathbf{x}}}}(W\mathbf{x})\leq\|W\mathbf{x}\|_0,\; \forall \mathbf{x}\in\mathbb{B}_\varrho(\bar{\mathbf{x}}), \; \varrho > 0. \end{equation} | (2.11) |
Combining this with \Phi(W\bar{\mathbf{x}}) = \|W\bar{\mathbf{x}}\|_0 and (2.9), we have
f(\bar{\mathbf{x}})+\lambda\|W\bar{\mathbf{x}}\|_0\leq f(\mathbf{x})+\lambda\|W\mathbf{x}\|_0,\; \; \; \; \forall \mathbf{x}\in\Omega\cap\mathbb{B}_\varrho(\bar{\mathbf{x}}). |
Thus, \bar{\mathbf{x}} is a local minimizer of (1.7).
Theorem 2.4 indicates that any lifted stationary point of (1.7) is a local minimizer of (1.3), which means that any local minimizer of (1.7) is also certainly a local minimizer of (1.3).
Theorem 2.5. If \bar{\mathbf{x}}\in\Omega is a global minimizer of (1.3) if and only if it is a global minimizer of (1.7). Moreover, problems (1.3) and (1.7) have the same optimal value.
Proof. On the one hand, let \bar{\mathbf{x}}\in\Omega be a global minimizer of (1.7), and according to Definition 2.1, then we can obtain that \bar{\mathbf{x}} is a lifted stationary point of (1.7). By (2.10), from W_i\bar{\mathbf{x}}\in(-\nu, \nu) , then W_i\bar{\mathbf{x}} = 0 , so it gives \Phi(W\bar{\mathbf{x}}) = \|W\bar{\mathbf{x}}\|_0 . We have
f(\bar{\mathbf{x}})+\lambda\|W\bar{\mathbf{x}}\|_0 = f(\bar{\mathbf{x}})+\lambda\Phi(W\bar{\mathbf{x}})\leq \ f(\mathbf{x})+\lambda\Phi(W\mathbf{x})\leq f(\mathbf{x})+\lambda\|W\mathbf{x}\|_0,\forall \mathbf{x}\in\Omega, |
where the last inequality uses \Phi(W\mathbf{x})\leq\|W\mathbf{x}\|_0 . Therefore, \bar{\mathbf{x}} is a global minimizer of (1.3).
On the other hand, let \bar{\mathbf{x}}\in\Omega be a global minimizer of (1.3). Assume on the contrary \bar{\mathbf{x}} is not a solution of (1.7). Let \hat{\mathbf{x}} be a global minimizer of (1.7), we obtain
f(\hat{\mathbf{x}})+\lambda\Phi(W\hat{\mathbf{x}}) < f(\bar{\mathbf{x}})+\lambda\Phi(W\bar{\mathbf{x}}). |
From
\Phi(W\hat{\mathbf{x}}) = \|W\hat{\mathbf{x}}\|_0 \; \; \mathrm{and} \; \; \Phi(W\bar{\mathbf{x}})\leq\|W\bar{\mathbf{x}}\|_0, |
we have
f(\hat{\mathbf{x}})+\lambda\|W\hat{\mathbf{x}}\|_0 < f(\bar{\mathbf{x}})+\lambda\|W\bar{\mathbf{x}}\|_0. |
This contradicts the global optimality of \bar{\mathbf{x}} for (1.3). Hence \bar{\mathbf{x}} is a global minimizer of (1.7). Therefore, (1.3) and (1.7) have the same global minimizers and optimal values.
When f is convex, \bar{\mathbf{x}} is a local minimizer of (1.3) if and only if \bar{\mathbf{x}}\in\Omega satisfies
\begin{equation} \begin{aligned} \mathbf{0}\in\partial f(\bar{\mathbf{x}})+N_\Omega(\bar{\mathbf{x}}). \end{aligned} \end{equation} | (2.12) |
which is often used as a criterion for the local minimizers of problem (1.3).
Definition 2.6. We call \bar{\mathbf{x}}\in\Omega a \nu -strong local minimizer of (1.3), if there exists \overline{\xi}\in\partial f(\bar{\mathbf{x}}) and \overline{\eta}\in N_\Omega(\bar{\mathbf{x}}) such that for any i\in supp(W\bar{\mathbf{x}}) , it holds
\begin{equation*} \mathbf{0} = \overline{\xi}+\overline{\eta}\; \; \text{and} \; \; |W_i\bar{\mathbf{x}}|\geq\nu. \end{equation*} |
By (2.12), any \nu -strong local minimizer of (1.3) is a local minimizer of it. Below we provide a result on the relationship between the \nu -strong local minimizers of (1.3) and the lifted stationary points of (1.7).
Proposition 2.7. \bar{\mathbf{x}}\in\Omega is a \nu -strong local minimizer of (1.3) if and only if it is a lifted stationary point of (1.7).
Proof. First, by (2.9), we see that if \bar{\mathbf{x}} is a lifted stationary point of (1.7), then
\mathcal{W}_{\mathcal{\ell}_0}(\bar{\mathbf{x}}) = f(\bar{\mathbf{x}})+\lambda\|W\bar{\mathbf{x}}\|_0 = f(\bar{\mathbf{x}})+\lambda\Phi(W\bar{\mathbf{x}}) = f(\bar{\mathbf{x}})+\lambda\Phi^{d^{W\bar{\mathbf{x}}}}(W\bar{\mathbf{x}})\leq f(\mathbf{x})+\lambda\Phi^{d^{W\bar{\mathbf{x}}}}(W\mathbf{x}),\; \; \; \forall\mathbf{x}\in\Omega. |
Combining the Lemma 2.3 and \Phi^{d^{W\bar{\mathbf{x}}}}(W\mathbf{x})\leq\|W\mathbf{x}\|_0, \; \forall \mathbf{x}\in\mathbb{B}_\varrho(\bar{\mathbf{x}}), \; \varrho > 0 in (2.11), then we have
\mathcal{W}_{\mathcal{\ell}_0}(\bar{\mathbf{x}})\leq\mathcal{W}_{\mathcal{\ell}_0}(\mathbf{x}),\; \; \; \forall\mathbf{x}\in\Omega\cap\mathbb{B}_\varrho(\bar{\mathbf{x}}), |
so \bar{\mathbf{x}} is a \nu -strong local minimizer of (1.3).
Next, because \bar{\mathbf{x}} is a \nu -strong local minimizer of (1.3), it is also a local minimizer of (1.3), suppose \bar{\mathbf{x}} is a local minimizer of (1.3) but not a local minimizer of (1.7). Then there exists a local minimizer of (1.7) denoted by \hat{\mathbf{x}} , combining (2.9), \Phi^{d^{W\hat{\mathbf{x}}}}(W\bar{\mathbf{x}})\leq\|W\bar{\mathbf{x}}\|_0, \; \forall \bar{\mathbf{x}}\in\mathbb{B}_\varrho(\hat{\mathbf{x}}) in (2.11) and \Phi(W\hat{\mathbf{x}}) = \|W\hat{\mathbf{x}}\|_0 , we have
f(\hat{\mathbf{x}})+\lambda\|W\hat{\mathbf{x}}\|_0 = f(\hat{\mathbf{x}})+\lambda\Phi(W\hat{\mathbf{x}}) = f(\hat{\mathbf{x}})+\lambda\Phi^{d^{W\hat{\mathbf{x}}}}(W\hat{\mathbf{x}})\leq f(\bar{\mathbf{x}})+\lambda\Phi^{d^{W\hat{\mathbf{x}}}}(W\bar{\mathbf{x}})\leq f(\bar{\mathbf{x}})+\lambda\|W\bar{\mathbf{x}}\|_0,\; \forall \bar{\mathbf{x}}\in\mathbb{B}_\varrho(\hat{\mathbf{x}}), |
which leads to a contradiction. Thus, the local minimizer of (1.3) is the local minimizer of (1.7), that is to say, the \nu -strong local minimizer of (1.3) is the lifted stationary point of (1.7).
We use Table 1 to clearly demonstrate the relationship between (1.3) and (1.7).
![]() |
The main content of this section is to find the lifted stationary point of (1.7). Due to the existence of matrix W , we cannot express the explicit solution using the proximal gradient method. We first approximate f by a smoothing function and propose some preliminary theories on the smoothing methods; the second section proposes our algorithm; and the third section conducts a convergence analysis on the proposed algorithm for solving (1.7).
Throughout this paper, we approximate the loss function f by a smoothing function \tilde{f} in (1.7). When it is clear from the context, the derivative of \tilde{f}(\mathbf{x}, \mu) with respect to \mathbf{x} is simply denoted as \nabla\tilde{f}(\mathbf{x}, \mu) . We denote
\widetilde{\mathcal{W}}^d(\mathbf{x},\mu): = \tilde{f}(\mathbf{x},\mu)+\lambda\Phi^d(W\mathbf{x}),\; \; \widetilde{\mathcal{W}}(\mathbf{x},\mu): = \tilde{f}(\mathbf{x},\mu)+\lambda\Phi(W\mathbf{x}), |
where smoothing parameter \mu > 0 and d\in\mathbb{D}^p . For any fixed \mu > 0 and d\in\mathbb{D}^p , \widetilde{\mathcal{W}}^d(\mathbf{x}, \mu) is nonsmooth and convex, and \widetilde{\mathcal{W}}(\mathbf{x}, \mu) is nonsmooth and nonconvex. Due to
\Phi(W\mathbf{x}) = \min\limits_{d\in\mathbb{D}^p}\Phi^d(W\mathbf{x}),\; \forall \mathbf{x}\in\Omega, |
we obtain
\begin{equation} \begin{aligned} \widetilde{\mathcal{W}}^d(\mathbf{x},\mu)\geq\widetilde{\mathcal{W}}(\mathbf{x},\mu),\; \forall d\in\mathbb{D}^p,\; \mathbf{x}\in\Omega,\; \mu\in(0,\bar{\mu}]. \end{aligned} \end{equation} | (3.1) |
The following definition describes some theories about the smoothing function \tilde{f} , which is frequently used in the proof of convergence analysis.
Definition 3.1 We call \tilde{f}:\mathbb{R}^n\times[0, \bar{\mu}]\rightarrow \mathbb{R} with \bar{\mu} > 0 a smoothing function of the convex function f in (1.7), if \tilde{f}(\cdot, \mu) is continuously differentiable in \mathbb{R}^n for any fixed \mu > 0 and satisfies the following conditions:
(ⅰ) \lim_{\mathbf{z}\rightarrow \mathbf{x}, \mu\downarrow0}\tilde{f}(\mathbf{z}, \mu) = f(\mathbf{x}), \forall \mathbf{x}\in\Omega;
(ⅱ) (convexity) \tilde{f}(\mathbf{x}, \mu) is convex with respect to \mathbf{x} in \Omega for any fixed \mu > 0 ;
(ⅲ) (gradient consistency) \{\lim_{\mathbf{z}\rightarrow \mathbf{x}, \mu\downarrow0}\bigtriangledown_\mathbf{z}\tilde{f}(\mathbf{z}, \mu)\}\subseteq\partial f(\mathbf{x}), \forall \mathbf{x}\in\Omega;
(ⅳ) (Lipschitz continuity with respect to \mu ) there exists a positive constant \kappa such that
|\tilde{f}(\mathbf{x},\mu_2)-\tilde{f}(\mathbf{x},\mu_1)|\leq\kappa|\mu_1-\mu_2|,\forall \mathbf{x}\in\Omega,\mu_1,\mu_2\in[0,\bar{\mu}]; |
(ⅴ) (Lipschitz continuity with respect to \mathbf{x} ) there exists a constant L > 0 such that for any \mu\in(0, \bar{\mu}], \bigtriangledown_\mathbf{x}\tilde{f}(\cdot, \mu) is Lipschitz continuous on \Omega with Lipschitz constant L\mu^{-1}.
By virtue of Definition 3.1-(ⅳ), we obtain that
\begin{equation} \begin{aligned} &|\tilde{f}(\mathbf{x},\mu)-f(\mathbf{x})|\leq\kappa\mu,\forall \mathbf{x}\in\Omega,0 < \mu\leq\bar{\mu}. \end{aligned} \end{equation} | (3.2) |
Next, we aim to solve the following problem with \mu > 0 and vector d\in\mathbb{D}^p
\begin{equation} \begin{aligned} \min\limits_{\mathbf{x}\in\Omega}\; \; \; \widetilde{\mathcal{W}}^d(\mathbf{x},\mu) = \tilde{f}(\mathbf{x},\mu)+\lambda\Phi^d(W\mathbf{x}), \end{aligned} \end{equation} | (3.3) |
by introducing an approximation of \widetilde{\mathcal{W}}^d(\cdot, \mu) around a given point \mathbf{z} as follows:
\begin{equation} \begin{aligned} G_{d,\gamma}(\mathbf{x},\mathbf{z},\mu) = \tilde{f}(\mathbf{z},\mu)+\langle \mathbf{x}-\mathbf{z},\nabla\tilde{f}(\mathbf{z},\mu)\rangle+\frac{1}{2}\gamma\mu^{-1}\|\mathbf{x}-\mathbf{z}\|^2+\lambda\Phi^d(W\mathbf{x}) \end{aligned} \end{equation} | (3.4) |
with a constant \gamma > 0 . \Phi^d(W\mathbf{x}) is convex with respect to \mathbf{x} for any fixed d\in\mathbb{D}^p , function G_{d, \gamma}(\mathbf{x}, \mathbf{z}, \mu) is a strongly convex function with respect to \mathbf{x} for any fixed d, \gamma, \mathbf{z} and \mu . Then, we solve the following problem
\min\limits_{\mathbf{x}\in\Omega}G_{d,\gamma}(\mathbf{x},\mathbf{z},\mu) |
to find the approximate solution of (3.3).
In this subsection, we propose a new algorithm (see Algorithm 1) for finding a lifted stationary point of (1.7). Specially, since W is a general linear mapping and \Phi(W\mathbf{x}) is an inseparable function, which makes the smoothing proximal gradient algorithm [9] cannot explicitly solve a subproblem. The proposed algorithm combines the smoothing method and the smoothing gradient descent algorithm, so we call it the smoothing gradient descent (SGD) algorithm. We use the SGD algorithm to obtain approximate solutions of the subproblem. Let
\mathcal{P}^s = \{k\in\mathbb{N}:\mu_{k+1}\neq\mu_k\}, |
and denote p_r^s the r th smallest number in \mathcal{P}^s . Then, we can obtain the following updating method of \{\mu_k\}
\begin{equation} \begin{aligned} \mu_k = \mu_{p_r^s+1} = \frac{\mu_0}{(p_r^s+1)^\sigma},\forall p_r^s+1\leq k\leq p_{r+1}^s, \end{aligned} \end{equation} | (3.5) |
which will be used in the proof of Lemmas 3.2 and 3.4.
Algorithm 1: Smoothing Gradient Descent (SGD) algorithm |
Require: Take \mathbf{x}^{-1} = \mathbf{x}^0\in\Omega and \mu_{-1} = \mu_0\in(0, \bar{\mu}] . Give parameters \rho > 1 , \sigma > \frac{1}{2} , \alpha > 0 and 0 < \underline{\gamma} < \bar{\gamma} . Set k = 0 . |
1: while a termination criterion is not met do |
2: Step 1.Choose \gamma_k\in[\underline{\gamma}, \bar{\gamma}] and let d^k\triangleq d^{W\mathbf{x}^k} , where d^{W\mathbf{x}^k} is defined in (2.5). |
3: Step 2. |
3: 2a) Compute |
4: |
{\hat{\mathbf{x}}}^{k+1} = \arg\min\limits_{\mathbf{x}\in\Omega}\; G_{d^k, \gamma^k}(\mathbf{x}, \mathbf{x}^{k}, \mu_k). (3.6) |
4: 2b) If {\hat{\mathbf{x}}}^{k+1} satisfies |
5: |
\widetilde{\mathcal{W}}^{d^k}({\hat{\mathbf{x}}}^{k+1}, \mu_k)\leq G_{d^k, \gamma^k}({\hat{\mathbf{x}}}^{k+1}, \mathbf{x}^{k}, \mu_k). (3.7) |
6: Set |
7: |
\mathbf{x}^{k+1} = {\hat{\mathbf{x}}}^{k+1} (3.8) |
8: and go to Step 3. Otherwise, let \gamma_k = \rho\gamma_k and return to 2a). |
9: Step 3. If |
10: |
\widetilde{\mathcal{W}}(\mathbf{x}^{k+1}, \mu_k)+\kappa\mu_k-\widetilde{\mathcal{W}}(\mathbf{x}^{k}, \mu_{k-1})-\kappa\mu_{k-1}\leq-\alpha\mu_k^2, (3.9) |
11: set \mu_{k+1} = \mu_k , otherwise, set |
12: |
\mu_{k+1} = \frac{\mu_0}{(k+1)^\sigma}. (3.10) |
13: Increment k by one and return to Step 1. |
14: end while |
Lemma 3.2. The proposed SGD algorithm is well-defined, and the sequences \{\mathbf{x}^k\} , \{\gamma^k\} and \{\mu_k\} generated by it own the following properties:
(ⅰ) \{\mathbf{x}^k\}\subseteq\Omega and \{\gamma_k\}\subseteq[\underline{\gamma}, \max\{\overline{\gamma}, \rho L\}];
(ⅱ) there are infinite elements in \mathcal{P}^s and \lim_{k\rightarrow \infty}\mu_k = 0.
Proof. (ⅰ) By organizing (3.7), we can obtain
\tilde{f}(\hat{\mathbf{x}}^{k+1},\mu_k)\leq\tilde{f}({\mathbf{x}}^{k},\mu_k)+\langle\nabla\tilde{f}({\mathbf{x}}^{k},\mu_k),\hat{\mathbf{x}}^{k+1}-\mathbf{x}^k\rangle+\frac{1}{2}\gamma_k\mu_k^{-1}\|\hat{\mathbf{x}}^{k+1}-\mathbf{x}^k\|^2. |
According to Definition 3.1-(ⅴ), (3.7) holds when \gamma_k\geq L . Thus the updating of \gamma_k in Step 2 is at most \log_\rho(\frac{L}{\underline{\gamma}})+1 times at each iteration. Hence, the SGD algorithm is well-defined, and we have that \gamma_k\leq\max\{\overline{\gamma}, \rho L\}, \forall k\in\mathbb{N}. From (3.8), it is easy to verify that \mathbf{x}^{k+1}\in\Omega by \mathbf{x}^{k}\in\Omega and \hat{\mathbf{x}}^{k+1}\in\Omega .
(ⅱ) Since \{\mu_k\} is non-increasing, to prove (ⅱ), we assume that \lim_{k\rightarrow \infty}\mu_k = \hat{\mu} > 0 by contradiction. If \{\mu_k\} converges to a non-zero value, then the iteration of (3.10) is finite, which means that there exists \mathrm{K}\in\mathbb{N} such that \mu_k = \hat{\mu}, \forall k\geq\mathrm{K} . Substituting \hat{\mu} into (3.9), we obtain
\widetilde{\mathcal{W}}(\mathbf{x}^{k+1},\mu_k)+\kappa\mu_k-\widetilde{\mathcal{W}}(\mathbf{x}^{k},\mu_{k-1})-\kappa\mu_{k-1}\leq-\alpha\hat{\mu}^2, \forall k\geq\mathrm{K}+1. |
By the above inequality, we have
\begin{equation} \begin{aligned} \lim\limits_{k\rightarrow \infty}\widetilde{\mathcal{W}}(\mathbf{x}^{k+1},\mu_k)+\kappa\mu_k = -\infty. \end{aligned} \end{equation} | (3.11) |
However, by \{\mathbf{x}^k\}\subseteq\Omega , (3.2) and Theorem 2.5, then
\begin{equation} \begin{aligned} \widetilde{\mathcal{W}}(\mathbf{x}^{k+1},\mu_k)+\kappa\mu_k\geq\mathcal{W}(\mathbf{x}^{k+1})\geq\min\limits_{\mathbf{x}\in\Omega}\mathcal{W}(\mathbf{x}) = \min\limits_{\mathbf{x}\in\Omega}\mathcal{W}_{\ell_0}(\mathbf{x}),\; \; \forall k\geq\mathrm{K}, \end{aligned} \end{equation} | (3.12) |
(3.11) and (3.12) are contradictory; (ⅱ) holds.
Lemma 3.3. For any k\in\mathbb{N} , we have
\begin{equation} \begin{aligned} \widetilde{\mathcal{W}}(\mathbf{x}^{k+1},\mu_k)-\widetilde{\mathcal{W}}(\mathbf{x}^{k},\mu_k)\leq-\frac{1}{2}\gamma_k\mu_k^{-1}\|\mathbf{x}^{k+1}-\mathbf{x}^k\|^2, \end{aligned} \end{equation} | (3.13) |
which implies \{\widetilde{\mathcal{W}}(\mathbf{x}^{k+1}, \mu_k)+\kappa\mu_k\} is non-increasing and \lim\limits_{k\rightarrow \infty}\widetilde{\mathcal{W}}(\mathbf{x}^{k+1}, \mu_k) = \lim\limits_{k\rightarrow \infty}\mathcal{W}(\mathbf{x}^{k}).
Proof. Since G_{d^k, \gamma_k}(\mathbf{x}, \mathbf{x}^k, \mu_k) is strongly convex with modulus \gamma_k\mu_k^{-1}, we have
\begin{equation} \begin{aligned} G_{d^k,\gamma_k}(\mathbf{x},\mathbf{x}^k,\mu_k)\geq G_{d^k,\gamma_k}(\hat{\mathbf{x}}^{k+1},\mathbf{x}^k,\mu_k)+\langle\nabla G_{d^k,\gamma_k}(\hat{\mathbf{x}}^{k+1},\mathbf{x}^k,\mu_k),\mathbf{x}-\hat{\mathbf{x}}^{k+1}\rangle\\ +\frac{1}{2}\gamma_k\mu_k^{-1}\|\hat{\mathbf{x}}^{k+1}-\mathbf{x}\|^2, \end{aligned} \end{equation} | (3.14) |
using the definition of \hat{\mathbf{x}}^{k+1} in (3.6) and \mathbf{x}^{k+1} = \hat{\mathbf{x}}^{k+1} when (3.7) holds, we obtain
G_{d^k,\gamma_k}(\mathbf{x}^{k+1},\mathbf{x}^{k},\mu_k)\leq G_{d^k,\gamma_k}(\mathbf{x},\mathbf{x}^k,\mu_k)-\frac{1}{2}\gamma_k\mu_k^{-1}\|\mathbf{x}^{k+1}-\mathbf{x}\|^2,\forall \mathbf{x}\in\Omega. |
By the definition of function G_{d^k, \gamma_k} given in (3.4), organizing the inequalities above, we have
\begin{equation} \begin{aligned} \lambda\Phi^{d^k}(W\mathbf{x}^{k+1})\leq&\lambda\Phi^{d^k}(W\mathbf{x})+\langle \mathbf{x}-\mathbf{x}^{k+1},\nabla\tilde{f}(\mathbf{x}^k,\mu_k)\rangle\\ &+\frac{1}{2}\gamma_k\mu_k^{-1}\|\mathbf{x}-\mathbf{x}^k\|^2 -\frac{1}{2}\gamma_k\mu_k^{-1}\|\mathbf{x}^{k+1}-\mathbf{x}^k\|^2\\ &-\frac{1}{2}\gamma_k\mu_k^{-1}\|\mathbf{x}^{k+1}-\mathbf{x}\|^2. \end{aligned} \end{equation} | (3.15) |
Moreover, (3.7) can be written as
\begin{equation} \begin{aligned} \widetilde{\mathcal{W}}^{d^k}(\mathbf{x}^{k+1},\mu_k)\leq&\tilde{f}(\mathbf{x}^k,\mu_k)+\langle \mathbf{x}^{k+1}-\mathbf{x}^k,\nabla\tilde{f}(\mathbf{x}^k,\mu_k)\rangle\\ &+\frac{1}{2}\gamma_k\mu_k^{-1}\|\mathbf{x}^{k+1}-\mathbf{x}^k\|^2 +\lambda\Phi^{d^k}(W\mathbf{x}^{k+1}). \end{aligned} \end{equation} | (3.16) |
Summing up (3.15) and (3.16), we obtain that
\begin{equation} \begin{aligned} \widetilde{\mathcal{W}}^{d^k}(\mathbf{x}^{k+1},\mu_k)\leq&\tilde{f}(\mathbf{x}^k,\mu_k)+\lambda\Phi^{d^k}(W\mathbf{x})+\langle \mathbf{x}-\mathbf{x}^k,\nabla\tilde{f}(\mathbf{x}^k,\mu_k)\rangle\\ &+\frac{1}{2}\gamma_k\mu_k^{-1}\|\mathbf{x}-\mathbf{x}^k\|^2-\frac{1}{2}\gamma_k\mu_k^{-1}\|\mathbf{x}^{k+1}-\mathbf{x}\|^2,\forall \mathbf{x}\in\Omega. \end{aligned} \end{equation} | (3.17) |
For a fixed \mu > 0 , the convexity of \tilde{f}(\mathbf{x}, \mu) with respect to \mathbf{x} indicates
\begin{equation} \begin{aligned} \tilde{f}(\mathbf{x}^k,\mu_k)+\langle \mathbf{x}-\mathbf{x}^k,\nabla\tilde{f}(\mathbf{x}^k,\mu_k)\rangle\leq\tilde{f}(\mathbf{x},\mu_k),\forall \mathbf{x}\in\Omega. \end{aligned} \end{equation} | (3.18) |
Combining (3.17) and (3.18) and utilizing the definition of \widetilde{\mathcal{W}}^{d^k} , one has
\begin{equation} \begin{aligned} \widetilde{\mathcal{W}}^{d^k}(\mathbf{x}^{k+1},\mu_k)\leq\widetilde{\mathcal{W}}^{d^k}(\mathbf{x},\mu_k)+\frac{1}{2}\gamma_k\mu_k^{-1}\|\mathbf{x}-\mathbf{x}^k\|^2-\frac{1}{2}\gamma_k\mu_k^{-1}\|\mathbf{x}^{k+1}-\mathbf{x}\|^2,\\ \forall \mathbf{x}\in\Omega. \end{aligned} \end{equation} | (3.19) |
Letting \mathbf{x} = \mathbf{x}^k in (3.19) and by d^k = d^{{W\mathbf{x}}^k} , we obtain
\begin{equation} \begin{aligned} \widetilde{\mathcal{W}}^{d^k}(\mathbf{x}^{k+1},\mu_k)+\frac{1}{2}\gamma_k\mu_k^{-1}\|\mathbf{x}^{k+1}-\mathbf{x}^k\|^2\leq\widetilde{\mathcal{W}}(\mathbf{x}^{k},\mu_k). \end{aligned} \end{equation} | (3.20) |
Because \widetilde{\mathcal{W}}^{d^k}(\mathbf{x}^{k+1}, \mu_k)\geq\widetilde{\mathcal{W}}(\mathbf{x}^{k+1}, \mu_k), (3.20) leads to (3.13). Due to Definition 3.1-(iv), we have
\tilde{f}(\mathbf{x}^k,\mu_{k-1})\geq\tilde{f}(\mathbf{x}^k,\mu_k)-\kappa(\mu_{k-1}-\mu_{k}), |
then, it follows that
\widetilde{\mathcal{W}}(\mathbf{x}^{k},\mu_k)\leq\widetilde{\mathcal{W}}(\mathbf{x}^{k},\mu_{k-1})+\kappa(\mu_{k-1}-\mu_k), |
by (3.13), we obtain
\begin{equation} \begin{aligned} \widetilde{\mathcal{W}}(\mathbf{x}^{k+1},\mu_k)+\kappa\mu_k+\frac{1}{2}\gamma_k\mu_k^{-1}\|\mathbf{x}^{k+1}-\mathbf{x}^k\|^2\leq\widetilde{\mathcal{W}}(\mathbf{x}^{k},\mu_{k-1})+\kappa\mu_{k-1}, \end{aligned} \end{equation} | (3.21) |
(3.21) implies the non-increasing property of \{\widetilde{\mathcal{W}}(\mathbf{x}^{k+1}, \mu_k)+\kappa\mu_k\}. This result and (3.12) ensure the existence of \lim_{k\rightarrow \infty}\widetilde{\mathcal{W}}(\mathbf{x}^{k+1}, \mu_k)+\kappa\mu_k . By virtue of \lim_{k\rightarrow \infty}\mu_k = 0 and Definition 3.1-(i), we obtain
\lim\limits_{k\rightarrow \infty}\widetilde{\mathcal{W}}(\mathbf{x}^{k+1},\mu_k) = \lim\limits_{k\rightarrow \infty}\mathcal{W}(\mathbf{x}^k). |
The proof is completed.
Lemma 3.4. The following statements hold:
(ⅰ) \sum_{k = 0}^\infty\gamma_k\mu_k^{-1}\|\mathbf{x}^{k+1}-\mathbf{x}^k\|^2\leq2(\mathcal{W}(\mathbf{x}^0, \mu_{-1})+\kappa\mu_{-1}-\min_\Omega\mathcal{W});
(ⅱ) \sum_{k = 0}^\infty\mu_k^2\leq\Lambda with \Lambda = \frac{1}{\alpha}(\widetilde{\mathcal{W}}(\mathbf{x}^0, \mu_{-1})+\kappa\mu_{-1}-\min_{\mathbf{x}\in\Omega}\mathcal{W}(\mathbf{x}))+\frac{2\mu_0^2\sigma}{2\sigma-1} < \infty ;
Proof. (ⅰ) Recalling (3.21), for all k\in\mathbb{N} , we obtain
\begin{equation} \begin{aligned} \gamma_k\mu_k^{-1}\|\mathbf{x}^{k+1}-\mathbf{x}^k\|^2\leq2(\widetilde{\mathcal{W}}(\mathbf{x}^k,\mu_{k-1})+\kappa\mu_{k-1}-\widetilde{\mathcal{W}}(\mathbf{x}^{k+1},\mu_{k})-\kappa\mu_{k}). \end{aligned} \end{equation} | (3.22) |
Now adding up the above inequality over k = 0, \ldots, \mathrm{K} , it gives
\begin{equation} \begin{aligned} \sum\limits_{k = 0}^\mathrm{K}\gamma_k\mu_k^{-1}\|\mathbf{x}^{k+1}-\mathbf{x}^k\|^2\leq2(\widetilde{\mathcal{W}}(\mathbf{x}^0,\mu_{-1})+\kappa\mu_{-1}-\widetilde{\mathcal{W}}(\mathbf{x}^{\mathrm{K}+1},\mu_{\mathrm{K}})-\kappa\mu_{\mathrm{K}}). \end{aligned} \end{equation} | (3.23) |
By letting \mathrm{K} in (3.22) tend to infinity and along with (3.12), we obtain (ⅰ).
(ⅱ) From (3.5), we have
\begin{equation} \begin{aligned} \sum\limits_{k\in\mathcal{P}^s}\mu_k^2 = \sum\limits_{r = 1}^\infty\frac{\mu_0^2}{(p_r^s+1)^{2\sigma}}\leq\sum\limits_{k = 1}^\infty\frac{\mu_0^2}{k^{2\sigma}}\leq\frac{2\mu_0^2\sigma}{2\sigma-1}, \end{aligned} \end{equation} | (3.24) |
where p_r^s is the r th smallest element in \mathcal{P}^s . When k\notin\mathcal{P}^s , (3.9) gives
\alpha\mu_k^2\leq\widetilde{\mathcal{W}}(\mathbf{x}^k,\mu_{k-1})+\kappa\mu_{k-1}-\widetilde{\mathcal{W}}(\mathbf{x}^{k+1},\mu_{k})-\kappa\mu_{k}, |
which together with the non-increasing property of \{\widetilde{\mathcal{W}}(\mathbf{x}^{k+1}, \mu_k)+\kappa\mu_k\} and (3.12) implies
\begin{equation} \begin{aligned} \sum\limits_{k\notin\mathcal{P}^s}\mu_k^2\leq\frac{1}{\alpha}(\widetilde{\mathcal{W}}(\mathbf{x}^0,\mu_{-1})+\kappa\mu_{-1}-\min\limits_\Omega\mathcal{W}). \end{aligned} \end{equation} | (3.25) |
Combining (3.24) and (3.25), the proof of (ii) is completed.
Theorem 3.5. If there is an accumulation point in \{\mathbf{x}^k:k\in\mathcal{P}^s\} , then the accumulation point is a lifted stationary point of (1.7).
Proof. Since (3.9) fails for k\in\mathcal{P}^s , by rearranging (3.21), we obtain that
\gamma_{k}\mu_{k}^{-1}\|\mathbf{x}^{k+1}-\mathbf{x}^{k}\|^2\leq2\alpha\mu_{k}^2, |
which gives
\|\mathbf{x}^{k+1}-\mathbf{x}^{k}\|\leq\sqrt{2\alpha\gamma_{k}^{-1}\mu_{k}^3}. |
Thus,
\gamma_{k}\mu_{k}^{-1}\|\mathbf{x}^{k+1}-\mathbf{x}^{k}\|\leq\sqrt{2\alpha\gamma_{k}\mu_{k}}, |
which together with \lim_{k\rightarrow \infty}\mu_{k} = 0 and \{\gamma_{k}\}\subseteq[\underline{\gamma}, \max\{\bar{\gamma}, \rho L\}] implies
\begin{equation} \begin{aligned} \lim\limits_{k\rightarrow \infty}\gamma_{k}\mu_{k}^{-1}\|\mathbf{x}^{k+1}-\mathbf{x}^{k}\| = 0 \; \; \mathrm{and} \; \lim\limits_{k\rightarrow \infty}\|\mathbf{x}^{k+1}-\mathbf{x}^k\| = 0. \end{aligned} \end{equation} | (3.26) |
Let \bar{\mathbf{x}} be an accumulation point of \{\mathbf{x}^k\}_{k\in\mathcal{P}^s} , (3.26) indicates that \{\mathbf{x}^k\} exists a subsequence \{\mathbf{x}^{k_t}\}_{k_t\in\mathcal{P}^s} converges to \bar{\mathbf{x}} . Similar analysis can be given for the case that k_t\in\mathcal{P}^s implies
\begin{equation} \begin{aligned} \lim\limits_{t\rightarrow \infty}\gamma_{k_t}\mu_{k_t}^{-1}\|\mathbf{x}^{k_t+1}-\mathbf{x}^{k_t}\| = 0 \; \; \mathrm{and} \; \lim\limits_{t\rightarrow \infty}\mathbf{x}^{k_t+1} = \bar{\mathbf{x}}. \end{aligned} \end{equation} | (3.27) |
Recalling \mathbf{x}^{k_t+1} = \hat{\mathbf{x}}^{k_t+1} defined in (3.6) and by its first-order necessary optimality condition, we have
\begin{equation} \begin{aligned} &\langle\nabla\tilde{f}(\mathbf{x}^{k_t},\mu_{k_t})+\gamma_{k_t}\mu_{k_t}^{-1}(\mathbf{x}^{k_t+1}-\mathbf{x}^{k_t})+\lambda\zeta^{k_t},\mathbf{x}-\mathbf{x}^{k_t+1}\rangle\geq0,\\ & \ \ \ \ \forall\zeta^{k_t}\in\partial\Phi^{d^{k_t}}(W\mathbf{x}^{k_t+1}),\mathbf{x}\in\Omega. \end{aligned} \end{equation} | (3.28) |
Since the elements in \{d^{k_t}:t\in\mathbb{N}\} are finite and \lim_{t\rightarrow \infty}\mathbf{x}^{k_t+1} = \bar{\mathbf{x}} , there exists a subsequence of \{k_t\} , denoted as \{k_{t_j}\} , and \bar{d}\in\mathbb{D}^p(W\bar{\mathbf{x}}) such that d^{k_{t_j}} = \bar{d}, \forall j\in\mathbb{N} . By the upper semicontinuity of \partial\Phi^{\bar{d}} and \lim_{j\rightarrow \infty}\mathbf{x}^{k_{t_j}+1} = \bar{\mathbf{x}} , it gives
\begin{equation} \begin{aligned} \left\{\lim\limits_{j\rightarrow \infty}\zeta^{k_{t_j}}:\zeta^{k_{t_j}}\in\partial\Phi^{d^{k_{t_j}}}(W\mathbf{x}^{k_{t_j}+1})\right\}\subseteq\partial\Phi^{\bar{d}}(W\bar{\mathbf{x}}). \end{aligned} \end{equation} | (3.29) |
Along with the subsequence \{k_{t_j}\} and letting j\rightarrow \infty in (3.28), from Definition 3.1-(ⅲ), (3.27) and (3.29), we obtain that there exist \bar{\xi}\in\partial f(\bar{\mathbf{x}}) and \bar{\zeta}^{\bar{d}}\in\partial\Phi^{\bar{d}}(W\bar{\mathbf{x}}) such that
\begin{equation} \begin{aligned} \langle\bar{\xi}+\lambda\bar{\zeta}^{\bar{d}},\mathbf{x}-\bar{\mathbf{x}}\rangle\geq0,\forall \mathbf{x}\in\Omega. \end{aligned} \end{equation} | (3.30) |
By \bar{d}\in\mathbb{D}^p(W\bar{\mathbf{x}}) , thanks to the convexity of f+\lambda\Phi^{\bar{d}} , (3.30) implies
f(\mathbf{x})+\lambda\Phi^{\bar{d}}(W\mathbf{x})-f(\bar{\mathbf{x}})-\lambda\Phi^{\bar{d}}(W\bar{\mathbf{x}}) \geq\langle\bar{\xi}+\lambda\bar{\zeta}^{\bar{d}},\mathbf{x}-\bar{\mathbf{x}}\rangle\geq0, \forall \mathbf{x}\in\Omega, |
which implies that \bar{\mathbf{x}} is a lifted stationary point of (1.7).
The purpose of this part is to test and verify the theoretical results and the properties of the SGD algorithm by the numerical experiments. We present Examples 1 and 2, which are respectively an under-determined linear regression problem and an over-determined censored regression problem. Especially, the process of solving subproblem (3.6) is very similar to the algorithm process of solving the LASSO problem.
All experiments are performed in MATLAB 2016a on a Lenovo PC with an Intel(R) Core(TM) i5-8250U CPU @1.60GHz 1801 Mhz and 8GB RAM. In the following examples, stopping criterion is set as
\begin{equation} \begin{aligned} \mathrm{number \; of\; iterations} \leq \mathbf{\mathtt{Maxiter}}\; \; \; \mathrm{or}\; \; \; \mu_k\leq\varepsilon. \end{aligned} \end{equation} | (4.1) |
We stop the proposed algorithm if the number of iterations exceeds \mathbf{\mathtt{Maxiter}} or the smoothing parameter is less than \varepsilon . Denote \bar{\mathbf{x}} the output of iterate \mathbf{x}^k . Set the fixed parameter \alpha = 1 throughout the numerical experiments.
Example 4.1. (Linear regression problem) Linear regression problems have been widely used in information theory[1], signal processing [35,36] and image restoration[6,36]. As pointed out in [20], \mathcal{\ell}_1 loss function is nonsmooth, but more robust and has stronger capability of outlier-resistance than the least squares loss function in the linear regression problems. Then we consider the following \mathcal{\ell}_0 regularized linear regression problem with \mathcal{\ell}_1 loss function:
\begin{equation} \begin{aligned} \min\limits_ {\mathbf{x}\in\Omega} \; \; \; \mathcal{W}_{\mathcal{\ell}_0}(\mathbf{x}): = \frac{1}{m}\|A\mathbf{x}-b\|_1+\lambda\|W\mathbf{\mathbf{x}}\|_0, \end{aligned} \end{equation} | (4.2) |
where A\in \mathbb{R}^{m\times n} with m = n , b\in\mathbb{R}^m . A smoothing function of the \mathcal{\ell}_1 loss function can be defined by
\begin{equation} \begin{aligned} &\tilde{f}(\mathbf{x},\mu) = \frac{1}{m}\sum\limits_{i = 1}^m\tilde{\theta}(A_i\mathbf{x}-b_i,\mu)\; \; with \; \; \tilde{\theta}(s,\mu) = \begin{cases} |s| & if \; |s| > \mu,\\ \frac{s^2}{2\mu}+\frac{\mu}{2} & if \; |s|\leq\mu.\\ \end{cases} \end{aligned} \end{equation} | (4.3) |
Denote s the \mathcal{\ell}_0 norm of true solution \mathbf{x}^* , i.e., \|W\mathbf{x}^*\|_0 = s . For the given positive integers m, n and s , the data are generated by
{\mathbf{\mathtt{W = randn(p,n)}}; \mathbf{\mathtt{B = randn(n,m)}}; \mathbf{ \mathtt{A = orth(B)'}}; \; \mathbf{\mathtt{b = A* \mathbf{x}^* +0.01*randn(m,1)}}.} |
In the algorithm, we set the parameters as below: \underline{\gamma} = \overline{\gamma} = 1, \; \mu_0 = 3.533, \; \mathbf{\mathtt{Maxiter}} = 10^3, \; \nu = 35.6014, \; \sigma = 3.0003, \; \rho = 1.0001, \; \kappa = \frac{1}{2}. Generate A, b and \mathbf{x}^* with m = n = 45 , p = 45 and s = 2 , set \lambda = 10^{-3} in (4.2) and \varepsilon = 10^{-3} in the stopping criterion (4.1). We set \mathbf{x}_0 = ones(n, 1) . Figure 1 shows the numerical results. Figure 1 plots \mathbf{x}^* and \overline{\mathbf{x}} , where \mathbf{x}^* and \overline{\mathbf{x}} denote the original signal (which can also be expressed as true solution) and the output of iterate \mathbf{x}^k from the SGD algorithm. From Figure 1, we can see that the output of \mathbf{x}^k is very close to the original generated signal.
Now we use another form of matrix W to solve Example 4.1:
W = \begin{pmatrix} 0& 0 & 0 & \cdots &0& 0 \\ -3 & 1 & 0 & \cdots &0& 0 \\ 0 &-3 & 1 & \cdots &0& 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots& \vdots\\ 0 & 0 & 0 & \cdots &-3& 1 \end{pmatrix}_{p \times n}. |
Set \underline{\gamma} = \overline{\gamma} = 1, \; \mu_0 = 3.533, \; \mathbf{\mathtt{Maxiter}} = 10^3, \; \nu = 36, \; \sigma = 7, \; \rho = 1.0001 \; \mathrm{and}\; \kappa = \frac{1}{2}. We randomly generate the data as follows:
{\mathbf{\mathtt{B = randn(n,m)}};\mathbf{ \mathtt{A = orth(B)'}}; \; \mathbf{\mathtt{b = A* \mathbf{x}^* +0.01*randn(m,1)}}.} |
We run numerical experiments with (m, n, p, s) = (45, 45, 45, 2). Set \lambda = 10^{-3} in (4.2) and \varepsilon = 10^{-3} in the stopping criterion (4.1). We define \mathbf{x}_0 = randn(n, 1) . From Figure 2, we can see that the output of \mathbf{x}^k obtained by the SGD algorithm is also close to the true solution \mathbf{x}^* .
The last special case of W is the penalty matrix in 1-dimensional Fused LASSO [39]:
W = \begin{pmatrix} -1& 1 & 0 & \cdots &0& 0 \\ 0 & -1 & 1 & \cdots &0& 0 \\ 0 &0 & -1 & \cdots &0& 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots& \vdots\\ 0 & 0 & 0 & \cdots &-1& 1 \end{pmatrix}_{p \times n}. |
Set \nu = 40, \; p = 45, \; n = 46 and \mathbf{x}_0 = ones(n, 1) . The remaining parameter values are the same as the previous situation (see Figure 3). From Figures 1–3, we can see that the output of \mathbf{x}^k obtained by the SGD algorithm is close to the true solution \mathbf{x}^* .
Example 4.2 (Censored regression problem) The application of censored regression problems has been studied in machine learning[37], economics[38], biomedical, and technical systems. The following censored regression problem is also a typical class of composite sparse optimization problems with nonsmooth convex loss functions. Now we consider the following {\mathcal{\ell}_0} regularized censored regression problem:
\min \limits_ {\mathbf{x}\in\Omega}\; \; \; \mathcal{W}_{\mathcal{\ell}_0}(\mathbf{x}): = \frac{1}{m}\|\max \{A\mathbf{x},0\}-b\|_1+\lambda\|W\mathbf{\mathbf{x}}\|_0, |
where A\in\mathbb{R}^{m\times n} and b\in\mathbb{R}^m . For the loss function in (1.5), a smoothing function of it can be defined by
\begin{equation*} \tilde{f}(\mathbf{x},\mu) = \frac{1}{m}\sum\limits_{i = 1}^m\tilde{\theta}(\tilde{\phi}(A_i\mathbf{x},\mu)-b_i,\mu)\; \; with \; \; \tilde{\phi}(s,\mu) = \begin{cases} \max\{s,0\} & if \; |s| > \mu,\\ \frac{(s+\mu)^2}{4\mu} & if \; |s|\leq\mu.\\ \end{cases} \end{equation*} |
Set \varepsilon = 10^{-2} , \nu = 16.0009 , \lambda = 10^{-3} , \mu_0 = 10.8999 , \sigma = 4.0003 , \kappa = \frac{1}{2} , \rho = 1.2006 and \mathbf{x}_0 = randn(n, 1) . In this example, we run numerical experiments with (m, n, p, s) = (40, 40, 40, 2), we randomly generate the problem data as follows:
\mathbf{\mathtt{A = randn(m,n)}}; \; \mathbf{\mathtt{W = randn(p,n)}}; \; \mathbf{ \mathtt{b = max(A* \mathbf{x}^* +0.01*randn(m,1),0)}}. |
The computational results of \mathbf{x}^* and \mathbf{x} are shown in Figure 4.
We use the following form of W to solve Example 4.2:
W = \begin{pmatrix} 0& -2 & 0 & 0&\cdots &0& 0 \\ -2 & 1 & -3 & 0&\cdots &0& 0 \\ 0 &-3 & 1 & -4&\cdots &0& 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots& \vdots& \vdots \\ 0 & 0 & 0 & 0&\cdots &1& -m\\ 0 & 0 & 0 & 0&\cdots &-m& 1 \end{pmatrix}_{p\times n}. |
Set \underline{\gamma} = \overline{\gamma} = 3, \; \mu_0 = 2, \; \mathbf{\mathtt{Maxiter}} = 10^3, \; \nu = 1.2200, \; \sigma = 2.6915, \; \rho = 1.0001 \; \mathrm{and}\; \kappa = 0.711. For the given positive integers m, n and s , the data are generated by
\mathbf{\mathtt{A = randn(m,n)}}; \; \mathbf{\mathtt{b = A* \mathbf{x}^* +0.01*randn(m,1)}}. |
We run numerical experiments with (m, n, p, s) = (40, 40, 40, 2). Set \lambda = 10^{-3} in (4.2), \mathbf{x}_0 = ones(n, 1) and \varepsilon = 10^{-3} . From Figures 4 and 5, it can be seen that the output of \mathbf{x}^k is very close to the true solution.
We have intensively studied the composite sparse optimization problem consisting of the sum of a nonsmooth convex function and the {\mathcal{\ell}_0} penalty term of a matrix times the coefficient vector. Considering the original cardinality penalty problem and an exact continuous relaxation problem with capped- {\mathcal{\ell}_1} penalty, we have proved several novel and interesting results: the consistency between global minimizers of the relaxation problem and the original problem, and local minimizers of relaxation problems are local minimizers of the original problem. We propose the SGD algorithm based on the smoothing method and the smoothing gradient descent algorithm. Then SGD algorithm has been investigated from both a theoretical and an algorithmic point of view. So we prove that if the sequence generated by the algorithm has an accumulation point, then it is a lifted stationary point of relaxation problem. This well explains why the algorithm is expected to enjoy an appealing performance from the theoretical perspective, which is testified by the numerical experiments. Our initial numerical results confirm the predicted underlying theoretical results.
Wei Yang: Conceptualization, writing-original draft, validation, software; Lili Pan: Conceptualization, supervision, funding acquisition, validation, software; Jinhui Wan: Software, methodology, data curation. All authors have read and approved the final version of the manuscript for publication.
The work of the authors was supported by the National Natural Science Foundation of China grants 12271309.
The authors declare no conflicts of interest.
[1] |
J. M. Carrasco, L. G. Franquelo, J. T. Bialasiewicz, E. Galvan, R. C. PortilloGuisado, M. A. M. Prats, et al., Power-electronic systems for the grid integration of renewable energy sources: a survey, IEEE T. Ind. Electron., 53 (2006), 1002–1016. https://doi.org/10.1109/TIE.2006.878356 doi: 10.1109/TIE.2006.878356
![]() |
[2] |
C. J. Melián, J. Bascompte, P. Jordano, V. Krivan, Diversity in a complex ecological network with two interaction types, Oikos, 118 (2009), 122–130. https://doi.org/10.1111/j.1600-0706.2008.16751.x doi: 10.1111/j.1600-0706.2008.16751.x
![]() |
[3] |
J. Y. Lin, Y. F. Ban, Complex network topology of transportation systems, Transport Rev., 33 (2013), 658–685. https://doi.org/10.1080/01441647.2013.848955 doi: 10.1080/01441647.2013.848955
![]() |
[4] |
X. Wang, X. Z. Liu, K. She, S. M. Zhong, Pinning impulsive synchronization of complex dynamical networks with various time-varying delay sizes, Nonlinear Anal.-Hybri., 26 (2017), 307–318. https://doi.org/10.1016/j.nahs.2017.06.005 doi: 10.1016/j.nahs.2017.06.005
![]() |
[5] |
X. S. Yang, J. Q. Lu, Finite-time synchronization of coupled networks with Markovian topology and impulsive effects, IEEE T. Automat. Contr., 61 (2016), 2256–2261. https://doi.org/10.1109/TAC.2015.2484328 doi: 10.1109/TAC.2015.2484328
![]() |
[6] |
L. Zou, Z. D. Wang, H. J. Gao, X. H. Liu, Event-triggered state estimation for complex networks with mixed time delays via sampled data information: the continuous-time case, IEEE T. Cybernetics, 45 (2015), 2804–2815. https://doi.org/10.1109/TCYB.2014.2386781 doi: 10.1109/TCYB.2014.2386781
![]() |
[7] |
Y. H. Deng, Z. H. Meng, H. Q. Lu, Adaptive event-triggered state estimation for complex networks with nonlinearities against hybrid attacks, AIMS Mathematics, 7 (2022), 2858–2877. https://doi.org/10.3934/math.2022158 doi: 10.3934/math.2022158
![]() |
[8] |
P. Wan, Z. G. Zeng, Synchronization of delayed complex networks on time scales via aperiodically intermittent control using matrix-based convex combination method, IEEE T. Neur. Net. Lear., 34 (2023), 2938–2950. https://doi.org/10.1109/TNNLS.2021.3110321 doi: 10.1109/TNNLS.2021.3110321
![]() |
[9] |
K. Ding, Q. X. Zhu, Fuzzy model-based quantitative control for prefixed time synchronization of stochastic reaction-diffusion complex networks under cyber-attacks, IEEE T. Autom. Sci. Eng., 2023 (2023), 3329239. https://doi.org/10.1109/TASE.2023.3329239 doi: 10.1109/TASE.2023.3329239
![]() |
[10] |
J. C. Jiang, X. D. Liu, Z. W. Wang, W. P. Ding, S. T. Zhang, H. Xu, Large group decision-making with a rough integrated asymmetric cloud model under multi-granularity linguistic environment, Inform. Sciences, 678 (2024), 120994. https://doi.org/10.1016/j.ins.2024.120994 doi: 10.1016/j.ins.2024.120994
![]() |
[11] |
J. C. Jiang, X. D. Liu, Z. W. Wang, W. P. Ding, S. T. Zhang Large group emergency decision-making with bi-directional trust in social networks: A probabilistic hesitant fuzzy integrated cloud approach, Inform. Fusion, 102 (2024), 102062. https://doi.org/10.1016/j.inffus.2023.102062 doi: 10.1016/j.inffus.2023.102062
![]() |
[12] |
W. T. Hua, Y. T. Wang, C. Y. Liu, New method for global exponential synchronization of multi-link memristive neural networks with three kinds of time-varying delays, Appl. Math. Comput., 471 (2024), 128593. https://doi.org/10.1016/j.amc.2024.128593 doi: 10.1016/j.amc.2024.128593
![]() |
[13] |
J. Liu, Z. H. Xu, L. Xue, Y. B. Wu, C. Y. Sun, Practical fixed-time synchronization of multilayer networks via intermittent event-triggered control, IEEE T. Syst. Man Cy-S., 54 (2024), 2626–2637. https://doi.org/10.1109/TSMC.2023.3341847 doi: 10.1109/TSMC.2023.3341847
![]() |
[14] |
Y. Ren, H. J. Jiang, C. Hu, Bipartite synchronization of multilayer signed networks under aperiodic intermittent-based adaptive dynamic event-triggered control, ISA T., 144 (2024), 72–85. https://doi.org/10.1016/j.isatra.2023.10.015 doi: 10.1016/j.isatra.2023.10.015
![]() |
[15] |
Y. Guo, Y. Z. Li, Bipartite leader-following synchronization of fractional-order delayed multilayer signed networks by adaptive and impulsive controllers, Appl. Math. Comput., 430 (2022), 127243. https://doi.org/10.1016/j.amc.2022.127243 doi: 10.1016/j.amc.2022.127243
![]() |
[16] |
Y. Xu, T. Lin, X. Z. Liu, W. X. Li, Exponential bipartite synchronization of fractional-order multilayer signed networks via hybrid impulsive control, IEEE T. Cybernetics, 53 (2023), 3926–3938. https://doi.org/10.1109/TCYB.2022.3190413 doi: 10.1109/TCYB.2022.3190413
![]() |
[17] |
N. Yang, L. T. Liu, H. Su, Stability of multi-link delayed impulsive stochastic complex networks with Markovian switching, J. Franklin I., 360 (2023), 12922–12940. https://doi.org/10.1016/j.jfranklin.2023.10.002 doi: 10.1016/j.jfranklin.2023.10.002
![]() |
[18] |
Y. Guo, B. D. Chen, Y. B. Wu, Finite-time synchronization of stochastic multi-links dynamical networks with Markovian switching topologies, J. Franklin I., 357 (2020), 359–384. https://doi.org/10.1016/j.jfranklin.2019.11.045 doi: 10.1016/j.jfranklin.2019.11.045
![]() |
[19] |
C. Gao, B. B. Guo, Y. Xiao, J. C. Bao, Aperiodically synchronization of multi-links delayed complex networks with semi-Markov jump and their numerical simulations to single-link robot arms, Neurocompting, 575 (2024), 127286. https://doi.org/10.1016/j.neucom.2024.127286 doi: 10.1016/j.neucom.2024.127286
![]() |
[20] |
J. M. Zhou, C. M. Zhang, H. L. Chen, Exponential stability of stochastic multi-layer complex network with regime-switching diffusion via aperiodically intermittent control, Inform. Sciences, 662 (2024), 120241. https://doi.org/10.1016/j.ins.2024.120241 doi: 10.1016/j.ins.2024.120241
![]() |
[21] |
S. Li, C. Y. Lv, X. H. Ding, Synchronization of stochastic hybrid coupled systems with multi-weights and mixed delays via aperiodically adaptive intermittent control, Nonlinear Anal.-Hybri., 35 (2020), 100819. https://doi.org/10.1016/j.nahs.2019.100819 doi: 10.1016/j.nahs.2019.100819
![]() |
[22] |
D. S. Xu, T. Wang, H. Su, Aperiodically intermittent pinning discrete-time observation control for exponential synchronization of stochastic multilayer coupled systems, Neurocomputing, 505 (2022), 203–213. https://doi.org/10.1016/j.neucom.2022.07.020 doi: 10.1016/j.neucom.2022.07.020
![]() |
[23] |
S. Li, Y. H. Zhang, H. Su, Almost sure synchronization of multilayer networks via intermittent pinning noises: A white-noise-based time-varying coupling, IEEE T. Circuits-I, 68 (2021), 3460–3473. https://doi.org/10.1109/TCSI.2021.3082005 doi: 10.1109/TCSI.2021.3082005
![]() |
[24] |
X. R. Mao, Stochastic stabilization and destabilization, Syst. Control Lett., 23 (1994), 279–290. https://doi.org/10.1016/0167-6911(94)90050-7 doi: 10.1016/0167-6911(94)90050-7
![]() |
[25] |
F. Q. Deng, Q. Luo, X. R. Mao, Stochastic stabilization of hybrid differential equations, Automatica, 48 (2012), 2321–2328. https://doi.org/10.1016/j.automatica.2012.06.044 doi: 10.1016/j.automatica.2012.06.044
![]() |
[26] |
X. Chen, X. Xiong, M. H. Zhang, W. Li, Public authority control strategy for opinion evolution in social networks, Chaos, 26 (2016), 083105. https://doi.org/10.1063/1.4960121 doi: 10.1063/1.4960121
![]() |
[27] |
X. X. Liao, X. Mao, Exponential stability and instability of stochastic neural networks, Stoch. Anal. Appl., 14 (1996), 165–185. https://doi.org/10.1080/07362999608809432 doi: 10.1080/07362999608809432
![]() |
[28] |
B. Zhang, C. C. Lim, P. Shi, S. L. Xie, F. Q. Deng, Stabilization of a class of nonlinear systems with random disturbance via intermittent stochastic noise, IEEE T. Automat. Contr., 65 (2020), 1318–1324. https://doi.org/10.1109/TAC.2019.2926890 doi: 10.1109/TAC.2019.2926890
![]() |
[29] |
L. Liu, M. Perc, J. D. Cao, Aperiodically intermittent stochastic stabilization via discrete time or delay feedback control, Sci. China Inf. Sci., 62 (2019), 72201. https://doi.org/10.1007/s11432-018-9600-3 doi: 10.1007/s11432-018-9600-3
![]() |
[30] |
X. R. Mao, Stabilization of continuous-time hybrid stochastic differential equations by discrete-time feedback control, Automatica, 49 (2013), 3677–3681. https://doi.org/10.1016/j.automatica.2013.09.005 doi: 10.1016/j.automatica.2013.09.005
![]() |
[31] |
X. R. Mao, Almost sure exponential stabilization by discrete-time stochastic feedback control, IEEE T. Automat. Contr., 61 (2016), 1619–1624. https://doi.org/10.1109/TAC.2015.2471696 doi: 10.1109/TAC.2015.2471696
![]() |
[32] |
Y. Zhao, Q. X. Zhu, Stabilization of highly nonlinear neutral stochastic systems with Markovian switching by periodically intermittent feedback control, Int. J. Robust Nonlin., 32 (2022), 10201–10214. https://doi.org/10.1002/rnc.6403 doi: 10.1002/rnc.6403
![]() |
[33] |
Y. Zhao, Q. X. Zhu, Stabilization of stochastic highly nonlinear delay systems with neutral-term, IEEE T. Automat. Contr., 68 (2023), 2544–2551. https://doi.org/10.1109/TAC.2022.3186827 doi: 10.1109/TAC.2022.3186827
![]() |
[34] |
C. X. Zhang, Q. X. Zhu, Exponential stability of random perturbation nonlinear delay systems with intermittent stochastic noise, J. Franklin I., 360 (2023), 792–812. https://doi.org/10.1016/j.jfranklin.2022.12.004 doi: 10.1016/j.jfranklin.2022.12.004
![]() |
[35] |
Y. B. Wu, J. L. Zhu, W. X. Li, Intermittent discrete observation control for synchronization of stochastic neural networks, IEEE T. Cybernetics, 50 (2020), 2414–2424. https://doi.org/10.1109/TCYB.2019.2930579 doi: 10.1109/TCYB.2019.2930579
![]() |
[36] |
X. L. He, C. K. Ahn, P. Shi, Periodically intermittent stabilization of neural networks based on discrete-time observations, IEEE T. Circuits-II, 67 (2020), 3497–3501. https://doi.org/10.1109/TCSII.2020.3005901 doi: 10.1109/TCSII.2020.3005901
![]() |
[37] |
W. Mao, S. R. You, Y. A. Jiang, X. R. Mao, Stochastic stabilization of hybrid neural networks by periodically intermittent control based on discrete-time state observations, Nonlinear Anal.-Hybri., 48 (2023), 101331. https://doi.org/10.1016/j.nahs.2023.101331 doi: 10.1016/j.nahs.2023.101331
![]() |
[38] |
Y. B. Wu, Y. C. Li, W. X. Li, Almost surely exponential synchronization of complex dynamical networks under aperiodically intermittent discrete observations noise, IEEE T. Cybernetics, 52 (2022), 2663–2674. https://doi.org/10.1109/TCYB.2020.3022296 doi: 10.1109/TCYB.2020.3022296
![]() |
[39] |
X. L. He, H. Y. Zhang, Exponential synchronization of complex networks via feedback control and periodically intermittent noise, J. Franklin I., 359 (2022), 3614–3630. https://doi.org/10.1016/j.jfranklin.2022.03.010 doi: 10.1016/j.jfranklin.2022.03.010
![]() |
[40] |
J. Respondek, Matrix black box algorithms-A survey, Bulletin of the Polish Academy of Sciences. Technical Sciences, 70 (2022), e140535. https://doi.org/10.24425/bpasts.2022.140535 doi: 10.24425/bpasts.2022.140535
![]() |
[41] |
A. Khan, T. Abdeljawad, M. A. Alqudah, Neural networking study of worms in a wireless sensor model in the sense of fractal fractional, AIMS Mathematics, 8 (2023), 26406–26424. https://doi.org/10.3934/math.20231348 doi: 10.3934/math.20231348
![]() |
[42] |
A. Khan, H. M. Hashim, T. Abdeljawad, Q. M. Al-Mdallal, H. Khan, Stability analysis of fractional nabla difference COVID-19 model, Results Phys., 22 (2021), 103888. https://doi.org/10.1016/j.rinp.2021.103888 doi: 10.1016/j.rinp.2021.103888
![]() |
[43] |
P. Bedi, A. Kumar, T. Abdeljawad, A. Khan, Study of Hilfer fractional evolution equations by the properties of controllability and stability, Alex. Eng. J., 60 (2021), 3741–3749. https://doi.org/10.1016/j.aej.2021.02.014 doi: 10.1016/j.aej.2021.02.014
![]() |
[44] |
H. Khan, J. F. Gomez-Aguilar, T. Abdeljawad, A. Khan, Existence results and stability criteria for ABC-fuzzy-Volterra integro-differential equation, Fractals, 28 (2020), 2040048. https://doi.org/10.1142/S0218348X20400484 doi: 10.1142/S0218348X20400484
![]() |
[45] |
A. Khan, T. Abdeljawad, On existence results of coupled pantograph discrete fractional order difference equations with numerical application, Results in Control and Optimization, 13 (2023), 100307. https://doi.org/10.1016/j.rico.2023.100307 doi: 10.1016/j.rico.2023.100307
![]() |
[46] |
W. H. Qi, G. D. Zong, J. Cheng, T. C. Jiao Robust finite-time stabilization for positive delayed semi-Markovian switching systems, Appl. Math. Comput., 351 (2019), 139–152. https://doi.org/10.1016/j.amc.2018.12.069 doi: 10.1016/j.amc.2018.12.069
![]() |
[47] |
B. Wang, Q. X. Zhu, S. B. Li, Stability analysis of discrete-time semi-Markov jump linear systems with time delay, IEEE T. Automat. Contr., 68 (2023), 6758–6765. https://doi.org/10.1109/TAC.2023.3240926 doi: 10.1109/TAC.2023.3240926
![]() |