![]() |
Composite sparsity generalizes the standard sparsity that considers the sparsity on a linear transformation of the variables. In this paper, we study the composite sparse optimization problem consisting of minimizing the sum of a nondifferentiable loss function and the ℓ0 penalty term of a matrix times the coefficient vector. First, we consider an exact continuous relaxation problem with a capped-ℓ1 penalty that has the same optimal solution as the primal problem. Specifically, we propose the lifted stationary point of the relaxation problem and then establish the equivalence of the original and relaxation problems. Second, we propose a smoothing gradient descent (SGD) algorithm for the continuous relaxation problem, which solves the subproblem inexactly since the objective function is inseparable. We show that if the sequence generated by the SGD algorithm has an accumulation point, then it is a lifted stationary point. At last, we present several computational examples to illustrate the efficiency of the algorithm.
Citation: Wei Yang, Lili Pan, Jinhui Wan. Smoothing gradient descent algorithm for the composite sparse optimization[J]. AIMS Mathematics, 2024, 9(12): 33401-33422. doi: 10.3934/math.20241594
[1] | Zhuolin Yan, Xiaowei Jiang, Siyao Wang . Objective penalty function method for nonlinear programming with inequality constraints. AIMS Mathematics, 2024, 9(12): 33572-33590. doi: 10.3934/math.20241602 |
[2] | Wei Xue, Pengcheng Wan, Qiao Li, Ping Zhong, Gaohang Yu, Tao Tao . An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics, 2021, 6(2): 1515-1537. doi: 10.3934/math.2021092 |
[3] | Charles Audet, Jean Bigeon, Romain Couderc, Michael Kokkolaras . Sequential stochastic blackbox optimization with zeroth-order gradient estimators. AIMS Mathematics, 2023, 8(11): 25922-25956. doi: 10.3934/math.20231321 |
[4] | Shexiang Hai, Liang He . The steepest descent method for fuzzy optimization problems under granular differentiability. AIMS Mathematics, 2025, 10(4): 10163-10186. doi: 10.3934/math.2025463 |
[5] | Habibu Abdullahi, A. K. Awasthi, Mohammed Yusuf Waziri, Issam A. R. Moghrabi, Abubakar Sani Halilu, Kabiru Ahmed, Sulaiman M. Ibrahim, Yau Balarabe Musa, Elissa M. Nadia . An improved convex constrained conjugate gradient descent method for nonlinear monotone equations with signal recovery applications. AIMS Mathematics, 2025, 10(4): 7941-7969. doi: 10.3934/math.2025365 |
[6] | Frank Rogers . Fuzzy gradient descent for the linear fuzzy real number system. AIMS Mathematics, 2019, 4(4): 1078-1086. doi: 10.3934/math.2019.4.1078 |
[7] | Sani Aji, Poom Kumam, Aliyu Muhammed Awwal, Mahmoud Muhammad Yahaya, Kanokwan Sitthithakerngkiet . An efficient DY-type spectral conjugate gradient method for system of nonlinear monotone equations with application in signal recovery. AIMS Mathematics, 2021, 6(8): 8078-8106. doi: 10.3934/math.2021469 |
[8] | Kin Keung Lai, Shashi Kant Mishra, Geetanjali Panda, Md Abu Talhamainuddin Ansary, Bhagwat Ram . On q-steepest descent method for unconstrained multiobjective optimization problems. AIMS Mathematics, 2020, 5(6): 5521-5540. doi: 10.3934/math.2020354 |
[9] | Pei-Chang Guo . New regularization methods for convolutional kernel tensors. AIMS Mathematics, 2023, 8(11): 26188-26198. doi: 10.3934/math.20231335 |
[10] | Ruiping Wen, Wenwei Li . An accelerated alternating directional method with non-monotone technique for matrix recovery. AIMS Mathematics, 2023, 8(6): 14047-14063. doi: 10.3934/math.2023718 |
Composite sparsity generalizes the standard sparsity that considers the sparsity on a linear transformation of the variables. In this paper, we study the composite sparse optimization problem consisting of minimizing the sum of a nondifferentiable loss function and the ℓ0 penalty term of a matrix times the coefficient vector. First, we consider an exact continuous relaxation problem with a capped-ℓ1 penalty that has the same optimal solution as the primal problem. Specifically, we propose the lifted stationary point of the relaxation problem and then establish the equivalence of the original and relaxation problems. Second, we propose a smoothing gradient descent (SGD) algorithm for the continuous relaxation problem, which solves the subproblem inexactly since the objective function is inseparable. We show that if the sequence generated by the SGD algorithm has an accumulation point, then it is a lifted stationary point. At last, we present several computational examples to illustrate the efficiency of the algorithm.
Sparse optimization is a core problem of compressed sensing [1,2,3], signal and image processing [4,5,6,7], and high-dimensional statistical learning [4,8], etc. Sparsity is usually characterized by the ℓ0 norm, which is the cardinality of the support set of vector x∈Rn, denoted by ‖x‖0=|supp(x)|=|{i∈{1,2,⋯,n}:xi≠0}|. The penalized formulation of sparse optimization can be expressed as the following cardinality regularized optimization:
minx∈Rnf(x)+λ‖x‖0, | (1.1) |
where f:Rn→R is a loss function depending on the application, λ>0 is the penalty parameter.
Compared with sparse optimization, composite sparse optimization problems enforce certain structural constraints instead of pure sparsity on the coefficients, which arise from many important applications in various fields, such as structural health monitoring [10], fault diagnosis [11], motion planning [12] and impact force identification[13], etc. The most important method is to promote the sparsity of variables through linear transformation [14]. By imposing a regularization matrix W=(W⊤1,…,W⊤p)⊤∈Rp×n on vector x, composite sparse optimization is nicely encapsulated as the following optimization formulation:
minx∈Rnf(x)+λ‖Wx‖0. | (1.2) |
A typical choice of function f in problem (1.2) is the ℓ2 loss function given by f(x)=12‖Ax−b‖22, where A=(A⊤1,…,A⊤m)⊤∈Rm×n and b=(b1,…,bm)⊤∈Rm, and the ℓ1 relaxation of the ℓ0 norm given by ‖Wx‖1, which was first defined and summarized as generalized LASSO with the general formulation of W [15]. Unlike traditional LASSO, solving generalized LASSO efficiently on high- dimensional data is very challenging. A few attempts have been made to improve the efficiency of generalized LASSO, but this requires a specific form of the W to work well [14,15,16], such as the fused LASSO problem [17], the TV regularizer [18] and trending filtering [19].
However, many loss functions of the composite sparse optimization problems cannot be expressed in the form of differentiable functions. The results in [20] showed that the least squares loss function can solve a class of linear regression problems but is not suitable for all types of data. We can choose the outlier that has a strong interference loss function, such as the ℓ1 function, quantile regression function, or more general Huber class function. On the other hand, J. Fan et al. [20] pointed out that using the ℓ1 relaxation often results in a biased estimator, various continuous nonconvex relaxation functions for ℓ0 norm were proposed, such as the smoothly clipped absolute deviation (SCAD) function [20], the hard thresholding function [21], capped-ℓ1 function [22,23,24,25,26], the transformed ℓ1 function [27], etc. Here, we are interested in the capped-ℓ1 function as the relaxation function of the ℓ0 norm, which is a simple relaxation function that satisfies specific properties. Z. Shen et al. [40] applied locally Lipschitz continuous scaled folded concave functions to the approximate ℓ0 pseudo-norm. A generic nonsmooth but convex framework was established to gradually approximate the scaled folded concave functions. Numerical experimental results showed the proposed framework and algorithms admitted the exact sparsity-induced capability of the ℓ0 pseudo-norm. Q. Chen et al.[41] first explored using a class of locally Lipschitz scale folded concave functions to approach the ℓ0. Then, a convex half-absolute method was proposed to precisely approximate these nonconvex nonsmooth functions. A double iterative algorithm was considered to solve the convex-relaxed composite optimization problems. Both [40] and [41] established a generic nonsmooth convex framework that gradually approximates these scale-folded concave functions based on the Legendre-Fenchel transformation to avoid directly solving a nonconvex optimization problem. However, we use a smoothing function for approximation to achieve the goal of solving the nonconvex optimization problem. The advantages of capped-ℓ1 function have been explored in various fields. For example, the authors in [28] put forward a capped ℓ1-norm Sparse Representation method (CSR) for graph clustering. The proposed model learned the optimal graph for clustering by integrating sparse representation and capped ℓ1-norm loss function. In order to utilize the advantages of the twin extreme learning machine and FDA, Z. Xue et al. [29] first put forward a novel classifier named Fisher-regularized twin extreme learning machine (FTELM). Also considering the instability of the ℓ2-norm for the outliers, authors introduced the capped ℓ1-norm into the FTELM model and proposed a more robust capped ℓ1-norm FTELM (Cℓ1-FTELM) model. The capped ℓ1 function was also discussed in the context of sparse group ℓ0 regularized algorithms by [30]. It's worth noting that reference [9] gave an exact continuous relaxation problem with capped-ℓ1 penalty for nonsmooth convex loss function with cardinality penalty in the sense that both problems have the same optimal solution set. Moreover, a smoothing proximal gradient algorithm for finding a lifted stationary point of the continuous relaxation model was proposed. Regarding the solution of relaxation problems, T. Zhang [42] presented a multi-stage convex relaxation scheme for solving problems with non-convex objective functions. However, only parameter estimation performance was analyzed in [42]. Unfortunately, the result in [42] does not directly imply that multi-stage convex relaxation achieves unbiased recovery of the support set. H. Zhou et al. [43] proposed a new unified algorithm based on the local linear approximation (LLA) for maximizing the penalized likelihood for a broad class of concave penalty functions. It did not eliminate the bias issue. Here, we extend the results in [9] to composite sparse optimization and give a smoothing gradient descent algorithm for the continuous relaxation problem. The new algorithm exploits the piecewise linearity of the capped-ℓ1 penalty term in the relaxation problem. In view of the composite sparsity, if the subproblem in the algorithm does not have a closed solution, then our relaxation problem model analogizes the ℓ1 penalty model for solving LASSO problems, using the smoothing gradient method to solve the subproblem. We prove that if the sequence generated by the algorithm has an accumulation point, then it is a lifted stationary point of relaxation problem.
In this paper, we consider the following composite sparse regression problem with cardinality penalty,
minx∈ΩWℓ0(x):=f(x)+λ‖Wx‖0, | (1.3) |
where f:Rn→R is a convex (not necessarily smooth) and bounded from below function, λ is a positive parameter, and Ω={x∈Rn:l≤Wx≤u}. For example, the ℓ1 loss function given by
f(x)=1m‖Ax−b‖1, | (1.4) |
or the censored regression problem with
f(x)=1mm∑i=1|max{Aix,0}−bi|. | (1.5) |
For a given parameter ν>0, the continuous relaxation of the ℓ0 penalty with the capped-ℓ1 function ϕ is given by
ϕ(t)=min{1,|t|ν}. | (1.6) |
We consider the following continuous optimization problem to solve (1.3):
minx∈ΩW(x):=f(x)+λΦ(Wx), | (1.7) |
where Φ(Wx)=∑pi=1ϕ(Wix).
Composite sparse optimization has attracted much attention recently. In [15], a dual path algorithm was proposed for the generalized LASSO problem with any formulation of W. If the composite optimization problem is convex and the W is the general linear map, one feasible approach is to apply the alternating direction method of multipliers (ADMM)[31]. In [31], the author proposed a dual method for the variational problem in the form of inf{f(Av)+g(v)}. Z. J. Bai [32] aimed to provide a coordinate gradient descent method with stepsize chosen by an Armijo-type rule to solve the problem minxf(x)+c‖Lx‖1 and minx‖Ax−b‖22+c‖Lx‖1 efficiently, especially when the problems dimension is large.
In this paper, we use the exact continuous relaxation problem with capped-ℓ1 function to solve optimization problem (1.3) and present a smoothing gradient descent (SGD) algorithm. Since W is a general linear mapping and Φ(Wx) is an inseparable function, which makes the proximal gradient algorithm unable to be explicitly applied, we approximately solve the subproblem in the SGD algorithm. We prove that if there is an accumulation point, then the accumulation point is a lifted stationary point of (1.7).
Notation. We denote N={0,1,…}. For x∈Rn and δ>0, let ‖x‖:=‖x‖2 and Bδ(x) means the open ball centered at x with radius δ. For a nonempty, closed, and convex set Ω⊆Rn, NΩ(x) means the normal cone to Ω at x∈Ω. σmin(W) is the minimum singular value of W. We denote ‖x‖∞=max{|x1|,|x2|,…,|xn|}.
Before starting this section, we first make the following two assumptions:
Assumption 2.1. Function f is Lipschitz continuous on Ω with Lipschitz constant Lf>0 and matrix W has full column rank.
Assumption 2.2. Parameter ν in (1.6) satisfies 0<ν<¯ν:=λσmin(W)Lf.
We suppose Assumptions 2.1 and 2.2 hold throughout the paper and assume that Lf is large enough such that Lf≥λσmin(W)Γ, where
Γ:=min{|li|,ui:li≠0,ui≠0,i=1,…,p}. |
When f is defined by the ℓ1 loss function (1.4) or the censored regression loss function (1.5), Lf can be taken as Lf=‖A‖∞.
We first give the definition of lifted stationary points of (1.7) as that in [33]. Since function ϕ in (1.6) can be rephrased as
ϕ(t)=1ν|t|−max{θ1(t),θ2(t),θ3(t)} |
with θ1(t)=0, θ2(t)=tν−1 and θ3(t)=−tν−1 for t∈R, we denote
D(t):={i∈{1,2,3}:θi(t)=max{θ1(t),θ2(t),θ3(t)}} | (2.1) |
and
Dp(Wx):=Πpi=1D(Wix). | (2.2) |
Definition 2.1. Point x∈Ω is called a lifted stationary point of (1.7) if there exists d=(d1,…,dp)⊤∈Dp(Wx) such that
λp∑i=1(θ′di(Wix))⊤∈∂f(x)+λνp∑i=1W⊤iϑi(x)+NΩ(x), | (2.3) |
where
ϑi(x){=1ifWix>0,∈[−1,1]if|Wix|=0,=−1ifWix<0. | (2.4) |
Under the definition of the range of ν in Assumption 2.2, we first prove that the element in Dp(Wx) for a lifted stationary point satisfying (2.3) is unique.
Proposition 2.2. If ˉx is a lifted stationary point of (1.7), then the vector dWˉx=(dWˉx1,…,dWˉxp)⊤∈Dp(Wˉx) satisfying (2.3) is unique. In particular, for i=1,…,p,
dWˉxi={1if|Wiˉx|<ν,2ifWiˉx≥ν,3ifWiˉx≤−ν. | (2.5) |
Proof. If |Wiˉx|≠ν, the statement is clearly valid. Hence, we only need to consider the case |Wiˉx|=ν. When Wiˉx=ν, since D(Wiˉx)={1,2}, arguing by contradiction, we assume (2.3) holds with dWˉxi=1, so ϑi(ˉx)=1. By ν<ˉν, we have Wiˉx∈(li,ui), and by (2.3), there exists ξ(ˉx)∈∂f(ˉx) such that
0=ξ(ˉx)+λν∑i:|Wiˉx|≤νW⊤iϑi(ˉx), | (2.6) |
where ϑi(ˉx) is defined as (2.4).
It is easy to observe that the following relation holds
‖∑i:|Wiˉx|≤νW⊤iϑi(ˉx)‖2≥σmin(W). | (2.7) |
In fact, from the definition of the minimum singular value of W,
σmin(W)=min{‖Wx‖2‖x‖2:x≠0}=min{‖Wx‖2:‖x‖2=1}, |
we have
σmin(W)=min{‖Wx‖2‖x‖2:x≠0}≤min{‖Wx‖2‖x‖2:‖x‖2≥1,‖x‖∞≤1}≤min{‖Wx‖2:‖x‖2≥1,‖x‖∞≤1}≤min{‖Wx‖2:‖x‖2=1}=σmin(W). |
Then, we see that
σmin(W)=min{‖Wx‖2:‖x‖2≥1,‖x‖∞≤1}. |
From the definition of ϑi(ˉx) (2.4), this yields that
‖∑i:|Wiˉx|≤νW⊤iϑi(ˉx)‖2≥σmin(WI)≥σmin(W), |
where WI is the submatrix consisting of the rows in W indexed by I:={i:|Wiˉx|≤ν} [34].
Combining (2.6) and (2.7), we have
λσmin(W)ν≤λν‖∑i:|Wiˉx|≤νW⊤iϑi(ˉx)‖=‖ξ(ˉx)‖≤Lf. |
This leads to a contradiction to ν<λσmin(W)Lf. Then, (2.5) holds for Wiˉx=ν. Similar analysis can be given for the case that Wiˉx=−ν, which completes the proof.
For a given d=(d1,…,dp)⊤∈Dp:={d∈Rp:di∈{1,2,3},i=1,⋯,p}, we define
Φd(Wx):=p∑i=1|Wix|ν−p∑i=1θdi(Wix), | (2.8) |
which is convex with respect to x. It can be verified that
Φ(Wx)=mind∈DpΦd(Wx),∀x∈Ω. |
In particular, for a fixed ˉx∈Ω, Φ(Wˉx)=ΦdWˉx(Wˉx) and the following relation holds
ˉxisaliftedstationarypointof(1.7)⇔ˉx∈argminx∈Ωf(x)+λΦdWˉx(Wx). | (2.9) |
Next lemma describes a lower bound property.
Lemma 2.3. If ˉx∈Ω is a lifted stationary point of (1.7), then it holds that
Wiˉx∈(−ν,ν)⇒Wiˉx=0,∀i=1,…,p. | (2.10) |
Proof. Suppose that ˉx is a lifted stationary point of (1.7). Now we assume that Wiˉx∈(−ν,ν)∖{0} for some i∈1,…,p. So from (2.5) and Assumption 2.1, we have dWˉxi=1 and Wiˉx∈(li,ui). By (2.3), there exists ξ(ˉx)∈∂f(ˉx). We have
0=ξ(ˉx)+λν∑i:|Wiˉx|<νW⊤iϑi(ˉx). |
Through the analysis in the proof of Proposition 2.2, combining (2.6) and (2.7), we have
λσmin(W)ν≤λν‖∑i:|Wiˉx|<νW⊤iϑi(ˉx)‖=‖ξ(ˉx)‖≤Lf, |
which leads to a contradiction to ν<λσmin(W)Lf. Consequently, Wiˉx∈(−ν,ν) implies Wiˉx=0 for i∈1,…,p {and} the proof is completed.
This subsection discusses the relationship between the global minimizers and local minimizers of (1.3) and (1.7). First, Theorem 2.4 discusses the relationship between the local minimizers of (1.3) and (1.7). Second, Theorem 2.5 states that (1.3) and (1.7) have the same global minimizers. We use the lower bound property mentioned in Lemma 2.3 to prove Theorems 2.4 and 2.5.
Theorem 2.4. If ˉx is a lifted stationary point of (1.7), then it is a local minimizer of (1.3) and the objective functions have the same value at ˉx, i.e., f(ˉx)+λΦ(Wˉx)=f(ˉx)+λ‖Wˉx‖0.
Proof. Combining the lower bound property of Wˉx in (2.10) and the definition of ΦdWˉx defined in (2.8), for any x∈Rn, we have
ΦdWˉx(Wx):=p∑i=1|Wix|ν−p∑i=1θdWˉxi(Wix)=∑i:|Wiˉx|≥ν1+∑i:|Wiˉx|<ν|Wix|ν=‖Wˉx‖0+∑i:Wiˉx=0|Wix|ν. |
Then
ΦdWˉx(Wx)≤‖Wx‖0,∀x∈Bϱ(ˉx),ϱ>0. | (2.11) |
Combining this with Φ(Wˉx)=‖Wˉx‖0 and (2.9), we have
f(ˉx)+λ‖Wˉx‖0≤f(x)+λ‖Wx‖0,∀x∈Ω∩Bϱ(ˉx). |
Thus, ˉx is a local minimizer of (1.7).
Theorem 2.4 indicates that any lifted stationary point of (1.7) is a local minimizer of (1.3), which means that any local minimizer of (1.7) is also certainly a local minimizer of (1.3).
Theorem 2.5. If ˉx∈Ω is a global minimizer of (1.3) if and only if it is a global minimizer of (1.7). Moreover, problems (1.3) and (1.7) have the same optimal value.
Proof. On the one hand, let ˉx∈Ω be a global minimizer of (1.7), and according to Definition 2.1, then we can obtain that ˉx is a lifted stationary point of (1.7). By (2.10), from Wiˉx∈(−ν,ν), then Wiˉx=0, so it gives Φ(Wˉx)=‖Wˉx‖0. We have
f(ˉx)+λ‖Wˉx‖0=f(ˉx)+λΦ(Wˉx)≤ f(x)+λΦ(Wx)≤f(x)+λ‖Wx‖0,∀x∈Ω, |
where the last inequality uses Φ(Wx)≤‖Wx‖0. Therefore, ˉx is a global minimizer of (1.3).
On the other hand, let ˉx∈Ω be a global minimizer of (1.3). Assume on the contrary ˉx is not a solution of (1.7). Let ˆx be a global minimizer of (1.7), we obtain
f(ˆx)+λΦ(Wˆx)<f(ˉx)+λΦ(Wˉx). |
From
Φ(Wˆx)=‖Wˆx‖0andΦ(Wˉx)≤‖Wˉx‖0, |
we have
f(ˆx)+λ‖Wˆx‖0<f(ˉx)+λ‖Wˉx‖0. |
This contradicts the global optimality of ˉx for (1.3). Hence ˉx is a global minimizer of (1.7). Therefore, (1.3) and (1.7) have the same global minimizers and optimal values.
When f is convex, ˉx is a local minimizer of (1.3) if and only if ˉx∈Ω satisfies
0∈∂f(ˉx)+NΩ(ˉx). | (2.12) |
which is often used as a criterion for the local minimizers of problem (1.3).
Definition 2.6. We call ˉx∈Ω a ν-strong local minimizer of (1.3), if there exists ¯ξ∈∂f(ˉx) and ¯η∈NΩ(ˉx) such that for any i∈supp(Wˉx), it holds
0=¯ξ+¯ηand|Wiˉx|≥ν. |
By (2.12), any ν-strong local minimizer of (1.3) is a local minimizer of it. Below we provide a result on the relationship between the ν-strong local minimizers of (1.3) and the lifted stationary points of (1.7).
Proposition 2.7. ˉx∈Ω is a ν-strong local minimizer of (1.3) if and only if it is a lifted stationary point of (1.7).
Proof. First, by (2.9), we see that if ˉx is a lifted stationary point of (1.7), then
Wℓ0(ˉx)=f(ˉx)+λ‖Wˉx‖0=f(ˉx)+λΦ(Wˉx)=f(ˉx)+λΦdWˉx(Wˉx)≤f(x)+λΦdWˉx(Wx),∀x∈Ω. |
Combining the Lemma 2.3 and ΦdWˉx(Wx)≤‖Wx‖0,∀x∈Bϱ(ˉx),ϱ>0 in (2.11), then we have
Wℓ0(ˉx)≤Wℓ0(x),∀x∈Ω∩Bϱ(ˉx), |
so ˉx is a ν-strong local minimizer of (1.3).
Next, because ˉx is a ν-strong local minimizer of (1.3), it is also a local minimizer of (1.3), suppose ˉx is a local minimizer of (1.3) but not a local minimizer of (1.7). Then there exists a local minimizer of (1.7) denoted by ˆx, combining (2.9), ΦdWˆx(Wˉx)≤‖Wˉx‖0,∀ˉx∈Bϱ(ˆx) in (2.11) and Φ(Wˆx)=‖Wˆx‖0, we have
f(ˆx)+λ‖Wˆx‖0=f(ˆx)+λΦ(Wˆx)=f(ˆx)+λΦdWˆx(Wˆx)≤f(ˉx)+λΦdWˆx(Wˉx)≤f(ˉx)+λ‖Wˉx‖0,∀ˉx∈Bϱ(ˆx), |
which leads to a contradiction. Thus, the local minimizer of (1.3) is the local minimizer of (1.7), that is to say, the ν-strong local minimizer of (1.3) is the lifted stationary point of (1.7).
We use Table 1 to clearly demonstrate the relationship between (1.3) and (1.7).
![]() |
The main content of this section is to find the lifted stationary point of (1.7). Due to the existence of matrix W, we cannot express the explicit solution using the proximal gradient method. We first approximate f by a smoothing function and propose some preliminary theories on the smoothing methods; the second section proposes our algorithm; and the third section conducts a convergence analysis on the proposed algorithm for solving (1.7).
Throughout this paper, we approximate the loss function f by a smoothing function ˜f in (1.7). When it is clear from the context, the derivative of ˜f(x,μ) with respect to x is simply denoted as ∇˜f(x,μ). We denote
˜Wd(x,μ):=˜f(x,μ)+λΦd(Wx),˜W(x,μ):=˜f(x,μ)+λΦ(Wx), |
where smoothing parameter μ>0 and d∈Dp. For any fixed μ>0 and d∈Dp, ˜Wd(x,μ) is nonsmooth and convex, and ˜W(x,μ) is nonsmooth and nonconvex. Due to
Φ(Wx)=mind∈DpΦd(Wx),∀x∈Ω, |
we obtain
˜Wd(x,μ)≥˜W(x,μ),∀d∈Dp,x∈Ω,μ∈(0,ˉμ]. | (3.1) |
The following definition describes some theories about the smoothing function ˜f, which is frequently used in the proof of convergence analysis.
Definition 3.1 We call ˜f:Rn×[0,ˉμ]→R with ˉμ>0 a smoothing function of the convex function f in (1.7), if ˜f(⋅,μ) is continuously differentiable in Rn for any fixed μ>0 and satisfies the following conditions:
(ⅰ) limz→x,μ↓0˜f(z,μ)=f(x),∀x∈Ω;
(ⅱ) (convexity) ˜f(x,μ) is convex with respect to x in Ω for any fixed μ>0;
(ⅲ) (gradient consistency) {limz→x,μ↓0▽z˜f(z,μ)}⊆∂f(x),∀x∈Ω;
(ⅳ) (Lipschitz continuity with respect to μ) there exists a positive constant κ such that
|˜f(x,μ2)−˜f(x,μ1)|≤κ|μ1−μ2|,∀x∈Ω,μ1,μ2∈[0,ˉμ]; |
(ⅴ) (Lipschitz continuity with respect to x) there exists a constant L>0 such that for any μ∈(0,ˉμ],▽x˜f(⋅,μ) is Lipschitz continuous on Ω with Lipschitz constant Lμ−1.
By virtue of Definition 3.1-(ⅳ), we obtain that
|˜f(x,μ)−f(x)|≤κμ,∀x∈Ω,0<μ≤ˉμ. | (3.2) |
Next, we aim to solve the following problem with μ>0 and vector d∈Dp
minx∈Ω˜Wd(x,μ)=˜f(x,μ)+λΦd(Wx), | (3.3) |
by introducing an approximation of ˜Wd(⋅,μ) around a given point z as follows:
Gd,γ(x,z,μ)=˜f(z,μ)+⟨x−z,∇˜f(z,μ)⟩+12γμ−1‖x−z‖2+λΦd(Wx) | (3.4) |
with a constant γ>0. Φd(Wx) is convex with respect to x for any fixed d∈Dp, function Gd,γ(x,z,μ) is a strongly convex function with respect to x for any fixed d,γ,z and μ. Then, we solve the following problem
minx∈ΩGd,γ(x,z,μ) |
to find the approximate solution of (3.3).
In this subsection, we propose a new algorithm (see Algorithm 1) for finding a lifted stationary point of (1.7). Specially, since W is a general linear mapping and Φ(Wx) is an inseparable function, which makes the smoothing proximal gradient algorithm [9] cannot explicitly solve a subproblem. The proposed algorithm combines the smoothing method and the smoothing gradient descent algorithm, so we call it the smoothing gradient descent (SGD) algorithm. We use the SGD algorithm to obtain approximate solutions of the subproblem. Let
Ps={k∈N:μk+1≠μk}, |
and denote psr the rth smallest number in Ps. Then, we can obtain the following updating method of {μk}
μk=μpsr+1=μ0(psr+1)σ,∀psr+1≤k≤psr+1, | (3.5) |
which will be used in the proof of Lemmas 3.2 and 3.4.
Algorithm 1: Smoothing Gradient Descent (SGD) algorithm |
Require: Take x−1=x0∈Ω and μ−1=μ0∈(0,ˉμ]. Give parameters ρ>1, σ>12, α>0 and 0<γ_<ˉγ. Set k=0. |
1: while a termination criterion is not met do |
2: Step 1.Choose γk∈[γ_,ˉγ] and let dk≜dWxk, where dWxk is defined in (2.5). |
3: Step 2. |
3: 2a) Compute |
4: |
ˆxk+1=argminx∈ΩGdk,γk(x,xk,μk). (3.6) |
4: 2b) If ˆxk+1 satisfies |
5: |
˜Wdk(ˆxk+1,μk)≤Gdk,γk(ˆxk+1,xk,μk). (3.7) |
6: Set |
7: |
xk+1=ˆxk+1 (3.8) |
8: and go to Step 3. Otherwise, let γk=ργk and return to 2a). |
9: Step 3. If |
10: |
˜W(xk+1,μk)+κμk−˜W(xk,μk−1)−κμk−1≤−αμ2k, (3.9) |
11: set μk+1=μk, otherwise, set |
12: |
μk+1=μ0(k+1)σ. (3.10) |
13: Increment k by one and return to Step 1. |
14: end while |
Lemma 3.2. The proposed SGD algorithm is well-defined, and the sequences {xk}, {γk} and {μk} generated by it own the following properties:
(ⅰ) {xk}⊆Ω and {γk}⊆[γ_,max{¯γ,ρL}];
(ⅱ) there are infinite elements in Ps and limk→∞μk=0.
Proof. (ⅰ) By organizing (3.7), we can obtain
˜f(ˆxk+1,μk)≤˜f(xk,μk)+⟨∇˜f(xk,μk),ˆxk+1−xk⟩+12γkμ−1k‖ˆxk+1−xk‖2. |
According to Definition 3.1-(ⅴ), (3.7) holds when γk≥L. Thus the updating of γk in Step 2 is at most logρ(Lγ_)+1 times at each iteration. Hence, the SGD algorithm is well-defined, and we have that γk≤max{¯γ,ρL},∀k∈N. From (3.8), it is easy to verify that xk+1∈Ω by xk∈Ω and ˆxk+1∈Ω.
(ⅱ) Since {μk} is non-increasing, to prove (ⅱ), we assume that limk→∞μk=ˆμ>0 by contradiction. If {μk} converges to a non-zero value, then the iteration of (3.10) is finite, which means that there exists K∈N such that μk=ˆμ,∀k≥K. Substituting ˆμ into (3.9), we obtain
˜W(xk+1,μk)+κμk−˜W(xk,μk−1)−κμk−1≤−αˆμ2,∀k≥K+1. |
By the above inequality, we have
limk→∞˜W(xk+1,μk)+κμk=−∞. | (3.11) |
However, by {xk}⊆Ω, (3.2) and Theorem 2.5, then
˜W(xk+1,μk)+κμk≥W(xk+1)≥minx∈ΩW(x)=minx∈ΩWℓ0(x),∀k≥K, | (3.12) |
(3.11) and (3.12) are contradictory; (ⅱ) holds.
Lemma 3.3. For any k∈N, we have
˜W(xk+1,μk)−˜W(xk,μk)≤−12γkμ−1k‖xk+1−xk‖2, | (3.13) |
which implies {˜W(xk+1,μk)+κμk} is non-increasing and limk→∞˜W(xk+1,μk)=limk→∞W(xk).
Proof. Since Gdk,γk(x,xk,μk) is strongly convex with modulus γkμ−1k, we have
Gdk,γk(x,xk,μk)≥Gdk,γk(ˆxk+1,xk,μk)+⟨∇Gdk,γk(ˆxk+1,xk,μk),x−ˆxk+1⟩+12γkμ−1k‖ˆxk+1−x‖2, | (3.14) |
using the definition of ˆxk+1 in (3.6) and xk+1=ˆxk+1 when (3.7) holds, we obtain
Gdk,γk(xk+1,xk,μk)≤Gdk,γk(x,xk,μk)−12γkμ−1k‖xk+1−x‖2,∀x∈Ω. |
By the definition of function Gdk,γk given in (3.4), organizing the inequalities above, we have
λΦdk(Wxk+1)≤λΦdk(Wx)+⟨x−xk+1,∇˜f(xk,μk)⟩+12γkμ−1k‖x−xk‖2−12γkμ−1k‖xk+1−xk‖2−12γkμ−1k‖xk+1−x‖2. | (3.15) |
Moreover, (3.7) can be written as
˜Wdk(xk+1,μk)≤˜f(xk,μk)+⟨xk+1−xk,∇˜f(xk,μk)⟩+12γkμ−1k‖xk+1−xk‖2+λΦdk(Wxk+1). | (3.16) |
Summing up (3.15) and (3.16), we obtain that
˜Wdk(xk+1,μk)≤˜f(xk,μk)+λΦdk(Wx)+⟨x−xk,∇˜f(xk,μk)⟩+12γkμ−1k‖x−xk‖2−12γkμ−1k‖xk+1−x‖2,∀x∈Ω. | (3.17) |
For a fixed μ>0, the convexity of ˜f(x,μ) with respect to x indicates
˜f(xk,μk)+⟨x−xk,∇˜f(xk,μk)⟩≤˜f(x,μk),∀x∈Ω. | (3.18) |
Combining (3.17) and (3.18) and utilizing the definition of ˜Wdk, one has
˜Wdk(xk+1,μk)≤˜Wdk(x,μk)+12γkμ−1k‖x−xk‖2−12γkμ−1k‖xk+1−x‖2,∀x∈Ω. | (3.19) |
Letting x=xk in (3.19) and by dk=dWxk, we obtain
˜Wdk(xk+1,μk)+12γkμ−1k‖xk+1−xk‖2≤˜W(xk,μk). | (3.20) |
Because ˜Wdk(xk+1,μk)≥˜W(xk+1,μk), (3.20) leads to (3.13). Due to Definition 3.1-(iv), we have
˜f(xk,μk−1)≥˜f(xk,μk)−κ(μk−1−μk), |
then, it follows that
˜W(xk,μk)≤˜W(xk,μk−1)+κ(μk−1−μk), |
by (3.13), we obtain
˜W(xk+1,μk)+κμk+12γkμ−1k‖xk+1−xk‖2≤˜W(xk,μk−1)+κμk−1, | (3.21) |
(3.21) implies the non-increasing property of {˜W(xk+1,μk)+κμk}. This result and (3.12) ensure the existence of limk→∞˜W(xk+1,μk)+κμk. By virtue of limk→∞μk=0 and Definition 3.1-(i), we obtain
limk→∞˜W(xk+1,μk)=limk→∞W(xk). |
The proof is completed.
Lemma 3.4. The following statements hold:
(ⅰ) ∑∞k=0γkμ−1k‖xk+1−xk‖2≤2(W(x0,μ−1)+κμ−1−minΩW);
(ⅱ) ∑∞k=0μ2k≤Λ with Λ=1α(˜W(x0,μ−1)+κμ−1−minx∈ΩW(x))+2μ20σ2σ−1<∞;
Proof. (ⅰ) Recalling (3.21), for all k∈N, we obtain
γkμ−1k‖xk+1−xk‖2≤2(˜W(xk,μk−1)+κμk−1−˜W(xk+1,μk)−κμk). | (3.22) |
Now adding up the above inequality over k=0,…,K, it gives
K∑k=0γkμ−1k‖xk+1−xk‖2≤2(˜W(x0,μ−1)+κμ−1−˜W(xK+1,μK)−κμK). | (3.23) |
By letting K in (3.22) tend to infinity and along with (3.12), we obtain (ⅰ).
(ⅱ) From (3.5), we have
∑k∈Psμ2k=∞∑r=1μ20(psr+1)2σ≤∞∑k=1μ20k2σ≤2μ20σ2σ−1, | (3.24) |
where psr is the rth smallest element in Ps. When k∉Ps, (3.9) gives
αμ2k≤˜W(xk,μk−1)+κμk−1−˜W(xk+1,μk)−κμk, |
which together with the non-increasing property of {˜W(xk+1,μk)+κμk} and (3.12) implies
∑k∉Psμ2k≤1α(˜W(x0,μ−1)+κμ−1−minΩW). | (3.25) |
Combining (3.24) and (3.25), the proof of (ii) is completed.
Theorem 3.5. If there is an accumulation point in {xk:k∈Ps}, then the accumulation point is a lifted stationary point of (1.7).
Proof. Since (3.9) fails for k∈Ps, by rearranging (3.21), we obtain that
γkμ−1k‖xk+1−xk‖2≤2αμ2k, |
which gives
‖xk+1−xk‖≤√2αγ−1kμ3k. |
Thus,
γkμ−1k‖xk+1−xk‖≤√2αγkμk, |
which together with limk→∞μk=0 and {γk}⊆[γ_,max{ˉγ,ρL}] implies
limk→∞γkμ−1k‖xk+1−xk‖=0andlimk→∞‖xk+1−xk‖=0. | (3.26) |
Let ˉx be an accumulation point of {xk}k∈Ps, (3.26) indicates that {xk} exists a subsequence {xkt}kt∈Ps converges to ˉx. Similar analysis can be given for the case that kt∈Ps implies
limt→∞γktμ−1kt‖xkt+1−xkt‖=0andlimt→∞xkt+1=ˉx. | (3.27) |
Recalling xkt+1=ˆxkt+1 defined in (3.6) and by its first-order necessary optimality condition, we have
⟨∇˜f(xkt,μkt)+γktμ−1kt(xkt+1−xkt)+λζkt,x−xkt+1⟩≥0, ∀ζkt∈∂Φdkt(Wxkt+1),x∈Ω. | (3.28) |
Since the elements in {dkt:t∈N} are finite and limt→∞xkt+1=ˉx, there exists a subsequence of {kt}, denoted as {ktj}, and ˉd∈Dp(Wˉx) such that dktj=ˉd,∀j∈N. By the upper semicontinuity of ∂Φˉd and limj→∞xktj+1=ˉx, it gives
{limj→∞ζktj:ζktj∈∂Φdktj(Wxktj+1)}⊆∂Φˉd(Wˉx). | (3.29) |
Along with the subsequence {ktj} and letting j→∞ in (3.28), from Definition 3.1-(ⅲ), (3.27) and (3.29), we obtain that there exist ˉξ∈∂f(ˉx) and ˉζˉd∈∂Φˉd(Wˉx) such that
⟨ˉξ+λˉζˉd,x−ˉx⟩≥0,∀x∈Ω. | (3.30) |
By ˉd∈Dp(Wˉx), thanks to the convexity of f+λΦˉd, (3.30) implies
f(x)+λΦˉd(Wx)−f(ˉx)−λΦˉd(Wˉx)≥⟨ˉξ+λˉζˉd,x−ˉx⟩≥0,∀x∈Ω, |
which implies that ˉx is a lifted stationary point of (1.7).
The purpose of this part is to test and verify the theoretical results and the properties of the SGD algorithm by the numerical experiments. We present Examples 1 and 2, which are respectively an under-determined linear regression problem and an over-determined censored regression problem. Especially, the process of solving subproblem (3.6) is very similar to the algorithm process of solving the LASSO problem.
All experiments are performed in MATLAB 2016a on a Lenovo PC with an Intel(R) Core(TM) i5-8250U CPU @1.60GHz 1801 Mhz and 8GB RAM. In the following examples, stopping criterion is set as
numberofiterations≤Maxiterorμk≤ε. | (4.1) |
We stop the proposed algorithm if the number of iterations exceeds Maxiter or the smoothing parameter is less than ε. Denote ˉx the output of iterate xk. Set the fixed parameter α=1 throughout the numerical experiments.
Example 4.1. (Linear regression problem) Linear regression problems have been widely used in information theory[1], signal processing [35,36] and image restoration[6,36]. As pointed out in [20], ℓ1 loss function is nonsmooth, but more robust and has stronger capability of outlier-resistance than the least squares loss function in the linear regression problems. Then we consider the following ℓ0 regularized linear regression problem with ℓ1 loss function:
minx∈ΩWℓ0(x):=1m‖Ax−b‖1+λ‖Wx‖0, | (4.2) |
where A∈Rm×n with m=n, b∈Rm. A smoothing function of the ℓ1 loss function can be defined by
˜f(x,μ)=1mm∑i=1˜θ(Aix−bi,μ)with˜θ(s,μ)={|s|if|s|>μ,s22μ+μ2if|s|≤μ. | (4.3) |
Denote s the ℓ0 norm of true solution x∗, i.e., ‖Wx∗‖0=s. For the given positive integers m,n and s, the data are generated by
W=randn(p,n);B=randn(n,m);A=orth(B)′;b=A∗x∗+0.01∗randn(m,1). |
In the algorithm, we set the parameters as below: γ_=¯γ=1,μ0=3.533,Maxiter=103,ν=35.6014,σ=3.0003,ρ=1.0001,κ=12. Generate A,b and x∗ with m=n=45, p=45 and s=2, set λ=10−3 in (4.2) and ε=10−3 in the stopping criterion (4.1). We set x0 = ones(n,1). Figure 1 shows the numerical results. Figure 1 plots x∗ and ¯x, where x∗ and ¯x denote the original signal (which can also be expressed as true solution) and the output of iterate xk from the SGD algorithm. From Figure 1, we can see that the output of xk is very close to the original generated signal.
Now we use another form of matrix W to solve Example 4.1:
W=(000⋯00−310⋯000−31⋯00⋮⋮⋮⋱⋮⋮000⋯−31)p×n. |
Set γ_=¯γ=1,μ0=3.533,Maxiter=103,ν=36,σ=7,ρ=1.0001andκ=12. We randomly generate the data as follows:
B=randn(n,m);A=orth(B)′;b=A∗x∗+0.01∗randn(m,1). |
We run numerical experiments with (m,n,p,s) = (45, 45, 45, 2). Set λ=10−3 in (4.2) and ε=10−3 in the stopping criterion (4.1). We define x0 = randn(n,1). From Figure 2, we can see that the output of xk obtained by the SGD algorithm is also close to the true solution x∗.
The last special case of W is the penalty matrix in 1-dimensional Fused LASSO [39]:
W=(−110⋯000−11⋯0000−1⋯00⋮⋮⋮⋱⋮⋮000⋯−11)p×n. |
Set ν=40,p=45,n=46 and x0 = ones(n,1). The remaining parameter values are the same as the previous situation (see Figure 3). From Figures 1–3, we can see that the output of xk obtained by the SGD algorithm is close to the true solution x∗.
Example 4.2 (Censored regression problem) The application of censored regression problems has been studied in machine learning[37], economics[38], biomedical, and technical systems. The following censored regression problem is also a typical class of composite sparse optimization problems with nonsmooth convex loss functions. Now we consider the following ℓ0 regularized censored regression problem:
minx∈ΩWℓ0(x):=1m‖max{Ax,0}−b‖1+λ‖Wx‖0, |
where A∈Rm×n and b∈Rm. For the loss function in (1.5), a smoothing function of it can be defined by
˜f(x,μ)=1mm∑i=1˜θ(˜ϕ(Aix,μ)−bi,μ)with˜ϕ(s,μ)={max{s,0}if|s|>μ,(s+μ)24μif|s|≤μ. |
Set ε=10−2, ν=16.0009, λ=10−3, μ0=10.8999, σ=4.0003, κ=12, ρ=1.2006 and x0 = randn(n,1). In this example, we run numerical experiments with (m,n,p,s) = (40, 40, 40, 2), we randomly generate the problem data as follows:
A=randn(m,n);W=randn(p,n);b=max(A∗x∗+0.01∗randn(m,1),0). |
The computational results of x∗ and x are shown in Figure 4.
We use the following form of W to solve Example 4.2:
W=(0−200⋯00−21−30⋯000−31−4⋯00⋮⋮⋮⋱⋮⋮⋮0000⋯1−m0000⋯−m1)p×n. |
Set γ_=¯γ=3,μ0=2,Maxiter=103,ν=1.2200,σ=2.6915,ρ=1.0001andκ=0.711. For the given positive integers m,n and s, the data are generated by
A=randn(m,n);b=A∗x∗+0.01∗randn(m,1). |
We run numerical experiments with (m,n,p,s) = (40, 40, 40, 2). Set λ=10−3 in (4.2), x0 = ones(n,1) and ε=10−3. From Figures 4 and 5, it can be seen that the output of xk is very close to the true solution.
We have intensively studied the composite sparse optimization problem consisting of the sum of a nonsmooth convex function and the ℓ0 penalty term of a matrix times the coefficient vector. Considering the original cardinality penalty problem and an exact continuous relaxation problem with capped-ℓ1 penalty, we have proved several novel and interesting results: the consistency between global minimizers of the relaxation problem and the original problem, and local minimizers of relaxation problems are local minimizers of the original problem. We propose the SGD algorithm based on the smoothing method and the smoothing gradient descent algorithm. Then SGD algorithm has been investigated from both a theoretical and an algorithmic point of view. So we prove that if the sequence generated by the algorithm has an accumulation point, then it is a lifted stationary point of relaxation problem. This well explains why the algorithm is expected to enjoy an appealing performance from the theoretical perspective, which is testified by the numerical experiments. Our initial numerical results confirm the predicted underlying theoretical results.
Wei Yang: Conceptualization, writing-original draft, validation, software; Lili Pan: Conceptualization, supervision, funding acquisition, validation, software; Jinhui Wan: Software, methodology, data curation. All authors have read and approved the final version of the manuscript for publication.
The work of the authors was supported by the National Natural Science Foundation of China grants 12271309.
The authors declare no conflicts of interest.
[1] | E. J. Candès, J. Romberg, T. Tao, Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information, IEEE T. Inform. Theory, 52 (2006), 489–509. |
[2] | D. L. Donoho, Compressed sensing, IEEE T. Inform. Theory, 52 (2006), 1289–1306. |
[3] |
F. Facchinei, Minimization of SC1 functions and the Maratos effect, Oper. Res. Lett., 17 (1995), 131–137. https://doi.org/10.1016/0167-6377(94)00059-F doi: 10.1016/0167-6377(94)00059-F
![]() |
[4] | M. Elad, Sparse and redundant representations: From theory to applications in signal and image processing, Springer Science & Business Media, 2010. |
[5] |
M. Elad, M. A. Figueiredo, Y. Ma, On the role of sparse and redundant representations in image processing, P. IEEE, 98 (2010), 972–982. https://doi.org/10.1109/JPROC.2009.2037655 doi: 10.1109/JPROC.2009.2037655
![]() |
[6] |
W. Bian, X. Chen, Linearly constrained non-Lipschitz optimization for image restoration, SIAM J. Imaging Sci., 8 (2015), 2294–2322. https://doi.org/10.1137/140985639 doi: 10.1137/140985639
![]() |
[7] |
X. Chen, M. K. Ng, C. Zhang, Non-Lipschitz ℓp-regularization and box constrained model for image restoration, IEEE T. Image Process., 21 (2012), 4709–4721. https://doi.org/10.1109/TIP.2012.2214051 doi: 10.1109/TIP.2012.2214051
![]() |
[8] |
J. Fan, L. Xue, H. Zou, Strong oracle optimality of folded concave penalized estimation, Ann. Stat., 42 (2014), 819. https://doi.org/10.1214/13-AOS1198 doi: 10.1214/13-AOS1198
![]() |
[9] |
W. Bian, X. Chen, A smoothing proximal gradient algorithm for nonsmooth convex regression with cardinality penalty, SIAM J. Numer. Anal., 58 (2020), 858–883. https://doi.org/10.1137/18M1186009 doi: 10.1137/18M1186009
![]() |
[10] |
X. Li, Z. Yang, X. Chen, Quantitative damage detection and sparse sensor array optimization of carbon fiber reinforced resin composite laminates for wind turbine blade structural health monitoring, Sensors, 14 (2014), 7312–7331. https://doi.org/10.3390/s140407312 doi: 10.3390/s140407312
![]() |
[11] | W. Huang, Q. Fu, H. Dou, Z. Dong, Resonance-based sparse signal decomposition based on genetic optimization and its application to composite fault diagnosis of rolling bearings, In: ASME International Mechanical Engineering Congress and Exposition (Vol. 57403, p. V04BT04A054). American Society of Mechanical Engineers, 2015. https://doi.org/10.1115/IMECE2015-50874 |
[12] | L. L. Beyer, N. Balabanska, E. Tal, S. Karaman, Multi-modal motion planning using composite pose graph optimization, In: 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021, 9981–9987. |
[13] | R. Zhou, Y. Wang, B. Qiao, W. Zhu, J. Liu, X. Chen, Impact force identification on composite panels using fully overlapping group sparsity based on Lp-norm regularization, Struct. Health Monit., 23 (2024), 137–161. |
[14] | J. Liu, L. Yuan, J. Ye, Guaranteed sparse recovery under linear transformation, In: International Conference on Machine Learning, 2013, 91–99. |
[15] | R. J. Tibshirani, The solution path of the generalized lasso, Stanford University, 2011. |
[16] | B. Xin, Y. Kawahara, Y. Wang, W. Gao, Efficient generalized fused lasso and its application to the diagnosis of Alzheimer's disease, In: Proceedings of the AAAI Conference on Artificial Intelligence, 2014. https://doi.org/10.1145/2847421 |
[17] |
R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, K. Knight, Sparsity and smoothness via the fused lasso, J. R. Stat. Soc. B, 67 (2005), 91–108 https://doi.org/10.1111/j.1467-9868.2005.00490.x doi: 10.1111/j.1467-9868.2005.00490.x
![]() |
[18] | L. I. Rudin, S. Osher, E. Fatemi, Nonlinear total variation based noise removal algorithms, Physica D, 60 (1992), 259–268. |
[19] |
S. J. Kim, K. Koh, S. Boyd, D. Gorinevsky, ℓ1 trend filtering, SIAM Rev., 51 (2009), 339–360. https://doi.org/10.1137/070690274 doi: 10.1137/070690274
![]() |
[20] |
J. Fan, R. Li, Variable selection via nonconcave penalized likelihood and its oracle properties, J. Am. Stat. Assoc., 96 (2001), 1348–1360. https://doi.org/10.1198/016214501753382273 doi: 10.1198/016214501753382273
![]() |
[21] |
Z. Zheng, Y. Fan, J. Lv, High dimensional thresholded regression and shrinkage effect, J. R. Stat. Soc. B, 76 (2014), 627–649. https://doi.org/10.1111/rssb.12037 doi: 10.1111/rssb.12037
![]() |
[22] | D. Peleg, R. Meir, A bilinear formulation for vector sparsity optimization, Signal Process., 88 (2008), 375–389. |
[23] | C. S. Ong, L. T. H. An, Learning sparse classifiers with difference of convex functions algorithms, Optim. Method. Softw., 28 (2013), 830–854. |
[24] |
T. Zhang, Multi-stage convex relaxation for feature selection, Bernoulli, 19 (2013), 2277–2293. https://doi.org/10.3150/12-BEJ452 doi: 10.3150/12-BEJ452
![]() |
[25] | W. Jiang, F. Nie, H. Huang, Robust dictionary learning with capped ℓ1-norm, In: Twenty-fourth international joint conference on artificial intelligence, 2015. |
[26] | L. Pan, X. Chen, Group sparse optimization for images recovery using capped folded concave functions, SIAM J. Imaging Sci., 14 (2021), 1–25. |
[27] | M. Nikolova, Local strong homogeneity of a regularized estimator, SIAM J. Appl. Math., 61 (2000), 633–658. |
[28] | M. Chen, Q. Wang, S. Chen, X. Li, Capped ℓ1-norm sparse representation method for graph clustering, IEEE Access, 7 (2019), 54464–54471. |
[29] | Z. Xue, L. Cai, Robust fisher-regularized twin extreme learning machine with capped ℓ1-norm for classification, Axioms, 12 (2023), 717. |
[30] |
E. Soubies, L. Blanc-Féraud, G. Aubert, A unified view of exact continuous penalties for ℓ2-ℓ0 minimization, SIAM J. Optimiz., 27 (2017), 2034–2060. https://doi.org/10.1137/16m1059333 doi: 10.1137/16m1059333
![]() |
[31] |
D. Gabay, B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite element approximation, Comput. Math. Appl., 2 (1976), 17–40. https://doi.org/10.1016/0898-1221(76)90003-1 doi: 10.1016/0898-1221(76)90003-1
![]() |
[32] |
Z. J. Bai, M. K. Ng, L. Qi, A coordinate gradient descent method for nonsmooth nonseparable minimization, Numer. Math.-Theory Me., 2 (2009), 377–402. https://doi.org/10.4208/nmtma.2009.m9002s doi: 10.4208/nmtma.2009.m9002s
![]() |
[33] |
J. S. Pang, M. Razaviyayn, A. Alvarado, Computing B-stationary points of nonsmooth DC programs, Math. Oper. Res., 42 (2017), 95–118. https://doi.org/10.1287/MOOR.2016.0795 doi: 10.1287/MOOR.2016.0795
![]() |
[34] | H. Lütkepohl, Handbook of matrices, John Wiley & Sons, 1997. |
[35] | A. M. Bruckstein, D. L. Donoho, M. Elad, From sparse solutions of systems of equations to sparse modeling of signals and images, SIAM Rev., 51 (2009), 34–81. |
[36] | M. Nikolova, M. K. Ng, Analysis of half-quadratic minimization methods for signal and image recovery, SIAM J. Sci. Comput., 27 (2005), 937–966. |
[37] | C. Cortes, V. Vapnik, Support-vector networks, Machine learning, 20 (1995), 273–297. https://doi.org/10.1023/A: 1022627411411 |
[38] | R. Blundell, J. L. Powell, Censored regression quantiles with endogenous regressors, J. Econometrics, 141 (2007), 65–83. |
[39] | T. B. Arnold, R. J. Tibshirani, Efficient implementations of the generalized lasso dual path algorithm, J. Comput. Graph. Stat., 25 (2016), 1–27. |
[40] | Z. Shen, Q. Chen, F. Yang, A convex relaxation framework consisting of a primal-dual alternative algorithm for solving ℓ0 sparsity-induced optimization problems with application to signal recovery based image restoration, J. Comput. Appl. Math., 421 (2023), 114878. |
[41] | Q. Chen, Z. Shen, A two-metric variable scaled forward-backward algorithm for ℓ0 optimization problem and its applications, Numer. Algorithms, 97 (2024), 191–221. |
[42] | T. Zhang, Analysis of multi-stage convex relaxation for sparse regularization, J. Mach. Learn. Res., 11 (2010). |
[43] | H. Zou, R. Li, One-step sparse estimates in nonconcave penalized likelihood models, Anna. Stat., 36 (2008), 1509. |