Processing math: 100%
Editorial

A robust circular RNA-based prognostic signature for postoperative recurrence in stage II/III colon cancer

  • Received: 27 September 2019 Accepted: 28 September 2019 Published: 30 September 2019
  • Citation: Ji Ruan. A robust circular RNA-based prognostic signature for postoperative recurrence in stage II/III colon cancer[J]. AIMS Genetics, 2019, 6(4): 67-69. doi: 10.3934/genet.2019.4.67

    Related Papers:

    [1] Siting Yu, Jingjing Peng, Zengao Tang, Zhenyun Peng . Iterative methods to solve the constrained Sylvester equation. AIMS Mathematics, 2023, 8(9): 21531-21553. doi: 10.3934/math.20231097
    [2] Min Wang, Jiao Teng, Lei Wang, Junmei Wu . Application of ADMM to robust model predictive control problems for the turbofan aero-engine with external disturbances. AIMS Mathematics, 2022, 7(6): 10759-10777. doi: 10.3934/math.2022601
    [3] Bao Ma, Yanrong Ma, Jun Ma . Adaptive robust AdaBoost-based kernel-free quadratic surface support vector machine with Universum data. AIMS Mathematics, 2025, 10(4): 8036-8065. doi: 10.3934/math.2025369
    [4] Bing Xue, Jiakang Du, Hongchun Sun, Yiju Wang . A linearly convergent proximal ADMM with new iterative format for BPDN in compressed sensing problem. AIMS Mathematics, 2022, 7(6): 10513-10533. doi: 10.3934/math.2022586
    [5] Yunfeng Shi, Shu Lv, Kaibo Shi . A new parallel data geometry analysis algorithm to select training data for support vector machine. AIMS Mathematics, 2021, 6(12): 13931-13953. doi: 10.3934/math.2021806
    [6] Yue Zhao, Meixia Li, Xiaowei Pan, Jingjing Tan . Partial symmetric regularized alternating direction method of multipliers for non-convex split feasibility problems. AIMS Mathematics, 2025, 10(2): 3041-3061. doi: 10.3934/math.2025142
    [7] Kanjanaporn Tansri, Pattrawut Chansangiam . Gradient-descent iterative algorithm for solving exact and weighted least-squares solutions of rectangular linear systems. AIMS Mathematics, 2023, 8(5): 11781-11798. doi: 10.3934/math.2023596
    [8] Fengxia Zhang, Ying Li, Jianli Zhao . The semi-tensor product method for special least squares solutions of the complex generalized Sylvester matrix equation. AIMS Mathematics, 2023, 8(3): 5200-5215. doi: 10.3934/math.2023261
    [9] Senyuan Yang, Bo Yu, Jianxin Pan, Wuliang Yin, Hua Wang, Kai Yang, Qingtai Xiao . Application of a hybrid nonlinear algorithm driven by machine learning and feature importance identification for temperature control prediction of the bath smelting process. AIMS Mathematics, 2025, 10(6): 13104-13129. doi: 10.3934/math.2025588
    [10] Andrew Calcan, Scott B. Lindstrom . The ADMM algorithm for audio signal recovery and performance modification with the dual Douglas-Rachford dynamical system. AIMS Mathematics, 2024, 9(6): 14640-14657. doi: 10.3934/math.2024712


  • Support vector machines (SVMs) are robust computational tools for supervised learning, commonly employed in classification and regression tasks. With foundations in statistical learning theory and Bayesian principles, SVMs aim to identify an optimal separating hyperplane that maximizes the margin between positive and negative examples [1]. This approach has proven effective in diverse applications such as particle identification, face recognition, text categorization, and bioinformatics.

    Expanding on the conventional SVM framework, Mangasarian and Wild [2] introduced the generalized eigenvalue proximal support vector machine (GEPSVM), which addresses binary classification by identifying two distinct hyperplanes, one for each category. The closest hyperplane is assigned to data points based on proximity, transforming the problem into a generalized eigenvalue formulation. The solutions are obtained using eigenvectors associated with the smallest eigenvalues.

    To further enhance classification efficiency, Jayadeva et al. [3] proposed the twin support vector machine (TWSVM), which also constructs two non-parallel hyperplanes. Unlike GEPSVM, TWSVM involves solving two smaller quadratic programming problems (QPPs) instead of a single large-scale QPP. This design reduces computational complexity, as shown in experimental evaluations that demonstrate TWSVM's superior performance over GEPSVM and standard SVMs on datasets from the University of California, Irvine (UCI) machine learning repository [4].

    Focusing on computational simplicity and scalability, Kumar and Gopal [5] developed the least squares twin support vector machine (LSTSVM) as an extension of TWSVM [6]. LSTSVM reformulates the primal QPPs of TWSVM using least squares principles, replacing inequality constraints with equality constraints. Consequently, the optimization reduces to solving two systems of linear equations, bypassing the need for external optimizers. This approach effectively accommodates nonlinear kernels while maintaining computational efficiency. Empirical comparisons across various UCI and artificial datasets confirm LSTSVM's faster training time and competitive classification accuracy relative to TWSVM and traditional LSTSVM.

    Despite these advantages, traditional LSTSVM minimizes the $ L_2 $-norm of error variables, making it sensitive to outliers. This sensitivity can degrade classification accuracy, especially in noisy or imbalanced datasets. To address this, several enhancements have been proposed to improve the robustness of LSTSVM and related models. Furthermore, the standard LSTSVM reduces the coefficients of irrelevant features without eliminating any of them entirely. As a result, if the dataset contains many irrelevant features, using the standard LSTSVM may lead to a complex model with numerous included features since none of the irrelevant coefficients are reduced to zero.

    Gao et al. [7] introduced the $ L_1 $-norm least squares twin support vector machine, which replaces the $ L_2 $-norm with the $ L_1 $-norm in the objective function to promote robustness and handle outliers effectively. This reformulation also promotes sparsity and feature suppression. By converting the constrained problem into an unconstrained convex quadratic form, they solve it efficiently using a generalized Newton method.

    Yan et al. [8] proposed the $ L_1 $-norm-based least squares twin bounded support vector machine, replacing all conventional $ L_2 $-norms with $ L_1 $-norms to reduce outlier influence. The optimization problems are addressed through an iterative reweighting technique.

    Wang et al. [9] introduced the robust capped $ L_1 $-norm twin support vector machine with privileged information, which incorporates the learning using privileged information framework. The capped $ L_1 $-norm enhances robustness, while upper and lower bound constraints on both main and privileged features control noise sensitivity. An alternating minimization approach is used to solve the optimization problems.

    In a related line of work, the robust capped $ L_1 $-norm projection twin support vector machine (CPTSVM) was proposed to improve the outlier resistance of PTSVM models by Yang et al. [10]. By replacing the squared $ L_2 $-norm with the capped $ L_1 $-norm, the CPTSVM formulation increases classifier robustness in the presence of noise and outliers. Though the resulting problems are non-convex and non-smooth, an iterative algorithm with proven convergence is employed to solve them. Experiments on artificial and benchmark datasets demonstrate the model's robustness and effectiveness.

    These related works inform the design of our proposed method, which integrates $ L_1 $-norm regularization into the LSTSVM framework and leverages the alternating direction method of multipliers (ADMM) [11,12,13] for scalable optimization. In contrast to prior methods, our approach introduces acceleration mechanisms and guard conditions to ensure both robustness and fast convergence on large and noisy datasets.

    This section briefly reviews the fundamental concepts of TWSVM, LSTSVM, ADMM, and the Lasso technique.

    TWSVM is a classification technique developed to reduce the computational burden of traditional SVM. Instead of finding a single hyperplane to separate two classes, TWSVM constructs two non-parallel hyperplanes. Each hyperplane is positioned closer to one class while maintaining the maximum possible distance from the other. Given a dataset $ D $ with $ m_1 $ and $ m_2 $ training points labeled $ +1 $ and $ -1, $ respectively, in $ \mathbb{R}^n $, the data points for class $ +1 $ are represented by matrix $ A \in \mathbb{R}^{m_1\times n} $, while matrix $ B \in \mathbb{R}^{m_2\times n} $ represents class $ -1. $ The linear TWSVM is defined by:

    $ x^Tw_1 +b_1 = 0 $

    and

    $ x^Tw_2 +b_2 = 0, $

    where $ w_1, w_2 \in \mathbb{R}^n $ are normal vectors, and $ b_1, b_2 \in \mathbb{R} $ are bias terms. These hyperplanes are obtained by solving two separate optimization problems, each associated with one class:

    $ minw1,b1,ξ212Aw1+e1b122+c1eT2ξ2s.t.(Bw1+e2b1)+ξ2e2,ξ20, $ (2.1)

    and

    $ minw2,b2,ξ112Bw2+e2b222+c2eT1ξ1s.t.(Aw2+e1b2)+ξ1e1,ξ10, $ (2.2)

    where $ c_1, c_2 > 0 $ are parameters, $ {\mathbf \xi}_1 \in \mathbb{R}^{m_{1}} $, $ {\mathbf \xi}_2 \in \mathbb{R}^{m_{2}} $ are slack vectors, and $ e_1 $ and $ e_2 $ are vectors of ones of appropriate dimensions.

    Using the Lagrangian dual method on (2.1) and (2.2), and introducing the Lagrange multipliers $ \alpha \in \mathbb{R}^{m_2} $ and $ \beta \in \mathbb{R}^{m_1} $, the resulting Wolfe dual formulations are:

    $ maxαeT2α12αTG(HTH)1GTαs.t.0αc1, $ (2.3)

    where

    $ G = [Be2] $

    and

    $ H = [Ae1]. $

    And

    $ maxβeT1β12βTP(QTQ)1PTβs.t.0βc2, $ (2.4)

    where

    $ P = [Ae1] $

    and

    $ Q = [Be2]. $

    Solving (2.3) and (2.4) yields the two non-parallel hyperplanes:

    $ u=(HTH)1GTα,  where  u=[w1b1]T,v=(QTQ)1PTβ,  where  v=[w2b2]T. $ (2.5)

    In comparison to SVM, the QPP in TWSVM has fewer parameters than that of SVM since (2.3) and (2.4) each involve only $ m_1 $ and $ m_2 $ parameters, unlike SVM's QPP, which depends on $ m_1+m_2 $ parameters.

    In cases involving nonlinear data, the kernel functions are introduced, allowing the two hyperplanes of TWSVM in the kernel space to be represented as follows:

    $ K(xT,CT)u1+b1=0   and   K(xT,CT)u2+b2=0, $ (2.6)

    where $ K $ is a properly selected kernel,

    $ C = [AB]^T $

    and $ u_1, u_2 \in \mathbb{R}^n. $ The nonlinear TWSVM optimization problem can be expressed:

    $ minw1,b1,ξ212K(A,CT)u1+e1b122+c1eT2ξ2s.t.(K(B,CT)u1+e2b1)+ξ2e2,ξ20, $ (2.7)

    and

    $ minw2,b2,ξ112K(B,CT)u2+e2b222+c2eT1ξ1s.t.(K(A,CT)u2+e1b2)+ξ1e1,ξ10. $ (2.8)

    To compute the hyperplanes for nonlinear TWSVM, the dual forms of (2.7) and (2.8) are derived and solved to obtain the hyperplanes in (2.6). However, solving nonlinear TWSVM involves handling two QPPs and requires inverting two matrices of sizes $ (m_1 \times m_1) $ and $ (m_2 \times m_2) $, which becomes computationally demanding for large datasets. This adds computational complexity compared to the linear case.

    The LSTSVM is a binary classification technique that creates two non-parallel hyperplanes, which is proposed by Kumar and Gopal. LSTSVM addresses two modified primal QPPs from TWSVM, applying a least squares approach in which inequality constraints are replaced by equalities in (2.1) and (2.2) as:

    $ minw1,b1,ξ212Aw1+e1b122+c12ξT2ξ2s.t.(Bw1+e2b1)+ξ2=e2, $ (2.9)

    and

    $ minw2,b2,ξ112Bw2+e2b222+c22ξT1ξ1s.t.(Aw2+e1b2)+ξ1=e1, $ (2.10)

    where $ c_1, c_2 $ are the regularization parameters and $ \xi_1, \xi_2 $ are slack variables.

    In (2.9), the QPP incorporates the $ L_2 $-norm of the slack variable $ \xi_2 $ with a weight of $ \frac{c_1}{2} $ rather than the $ L_1 $-norm weighted by $ c_1 $ as used in (2.1). This change renders the constraint $ \xi_2 \geq 0 $ unnecessary. As a result, solving (2.9) reduces to solving a system of linear equations. By substituting the equality constraints directly into the objective function, the problem is rewritten as:

    $ minw1,b112Aw1+e1b122+c12Bw1+e2b1+e222. $ (2.11)

    Setting the gradient of (2.11) with respect to $ w_1 $ and $ b_1 $ to zero yields a closed-form solution for QPP (2.9):

    $ [w1b1]=(FTF+1c1ETE)1FTe2, $ (2.12)

    where

    $ E = [Ae1] $

    and

    $ F = [Be2]. $

    Similarly, solving for the second hyperplane (2.10) gives:

    $ [w2b2]=(ETE+1c2FTF)1ETe1. $ (2.13)

    Therefore, the two nonparallel hyperplanes of LSTSVM can be obtained by inverting two matrices of dimension $ (n+1) \times(n+1), $ where $ n $ is the number of features, which is significantly smaller than the total number of training samples. This makes LSTSVM more computationally efficient than TWSVM while also improving generalization. The nonlinear case follows the same approach, replacing linear terms with kernel functions.

    ADMM was first introduced by Glowinski and Marroco [11] and Gabay and Mercier [12]. ADMM is an iterative optimization algorithm designed to decompose complex problems into manageable subproblems, which are then solved alternately. This approach ensures computational efficiency and makes ADMM particularly suitable for distributed and large-scale problems. ADMM solves optimization problems of the form:

    $ minx,zf(x)+g(z)s.t.Ax+Bz=c, $ (2.14)

    where $ f $ and $ g $ are convex functions, and $ A $, $ B $, and $ c $ are given matrices/vectors. Since (2.14) is a constrained minimization problem, we can write the related augmented Lagrangian:

    $ L_\rho(x, z, y) = f(x)+g(z)+ y^T\left(Ax+Bz-c\right) + \dfrac{\rho}{2}\left\lVert{Ax+Bz-c}\right\rVert^2_2. $

    ADMM operates through a sequence of iterations

    $ x(k+1):=argminxLρ(x,z(k),y(k)), $ (2.15)
    $ z(k+1):=argminzLρ(x(k+1),z,y(k)), $ (2.16)
    $ y(k+1):=y(k)+ρ(Ax(k+1)+Bz(k+1)c), $ (2.17)

    where $ \rho > 0 $ is the Lagrangian dual variable, which is also called the penalty parameter. The algorithm involves three key steps. First, an $ x $-minimization step optimizes $ x $ using the augmented Lagrangian function $ L_\rho $ while keeping $ z $ and the dual variable $ y $ fixed. Next, a $ z $-minimization step updates $ z $ similarly. Finally, the dual variable $ y $ is updated using a step size proportional to the augmented Lagrangian parameter $ \rho. $

    The Lasso. Lasso regularization, which utilizes the $ L_1 $-norm, is an optimization and machine learning approach designed to reduce overfitting and promote sparsity in model parameters and is often solved using the ADMM because ADMM is well-suited for convex optimization problems with separable objective functions and constraints. The corresponding Lasso formulation is expressed as:

    $ minβ12Xβy22+τβ1, $ (2.18)

    where $ y \in \mathbb{R}^n, X \in \mathbb{R}^{n \times p}, $ and $ \tau > 0 $ is a scalar regularization parameter that controls the strength of the penalty.

    ADMM introduces an auxiliary variable $ z $ to separate the least squares and $ L_1 $ penalty terms:

    $ minβ,z12Xβy22+τz1s.t.βz=0. $ (2.19)

    ADMM solves this using the augmented Lagrangian:

    $ L_\rho(\beta, z, u) = \dfrac{1}{2}\left\lVert{X\beta-y}\right\rVert^2_2 + \tau\left\lVert{z}\right\rVert_1+\dfrac{\rho}{2}\left\lVert{\beta-z+u}\right\rVert^2_2, $

    where $ u $ is the scaled dual variable and $ \rho > 0 $ is the penalty parameter controlling convergence. ADMM then alternates between updating $ \beta, z, $ and $ u $:

    (1) Update $ \beta $ (least squares step):

    $ β(k+1)=(XTX+ρI)1(XTy+ρ(z(k)u(k))), $ (2.20)

    where $ \left(X^TX+\rho I\right) $ is always invertible, since $ \rho > 0. $

    (2) Update $ z $ (soft-thresholding step):

    $ z(k+1)=Sτ/ρ(β(k+1)+u(k)), $ (2.21)

    where $ S_{\tau/\rho} $ is the soft-thresholding operator, which recall is defined as:

    $ S_{\varepsilon}(\nu) = {νε,if ν>ε,0,if ενε,ν+ε,if ν<ε. $

    (3) Update $ u $ (dual variable update):

    $ u(k+1)=u(k)+ρ(β(k+1)z(k+1)). $ (2.22)

    Lasso LSTSVM by ADMM. Building on the foundational concepts discussed earlier, this section presents the formulation for solving LSTSVM with $ L_1 $-norm regularization, also known as the Lasso technique, using the ADMM framework. ADMM is chosen over traditional optimization techniques due to its strong scalability and its ability to decompose complex objectives—particularly those involving non-smooth terms like the $ L_1 $-norm—into simpler subproblems. This makes it especially effective for high-dimensional, large-scale datasets.

    Linear case. In this work, we replace the $ L_2 $-norm of the penalty term in LSTSVM (2.9) and (2.10) with the $ L_1 $-norm, allowing the problem to be reformulated in a manner similar to the Lasso technique. This modification promotes sparsity in the slack variables and enables the use of ADMM to decompose the problem into simpler subproblems, improving computational efficiency. The modified optimization problems are formulated as follows:

    $ minw1,b1,ξ212Aw1+e1b122+c12ξ21s.t.(Bw1+e2b1)+ξ2=e2, $ (3.1)

    and

    $ minw2,b2,ξ112Bw2+e2b222+c22ξ11s.t.(Aw2+e1b2)+ξ1=e1, $ (3.2)

    where $ c_1, c_2 $ are given positive parameters.

    To facilitate efficient computation, we reformulate the problems using auxiliary variables:

    $ minx1,ξ212Fx122+τ1ξ21s.t.Ex1+ξ2=e2, $ (3.3)

    and

    $ minx2,ξ112Ex222+τ2ξ11s.tFx2+ξ1=e1, $ (3.4)

    where

    $ F = [Ae1], \ \ \ E = [Be2], \ \ \ x_1 = [w1b1]^T, $

    and

    $ x_2 = [w2b2]^T. $

    The augmented Lagrangian for the first reformulated problem (e.g., for $ x_1 $ and $ \xi_2 $) is given by:

    $ Lρ(x1,ξ2,y1)=12Fx122+τ1ξ21+yT1(Ex1ξ2+e2)+ρ2Ex1ξ2+e222, $ (3.5)

    where $ \rho > 0 $ serves as a penalty parameter and $ y_1 $ is the dual variable. The ADMM update rules for solving this problem are as follows:

    $ (1) $ $ x_1 $-update: Solve for $ x_1^{(k+1)} $:

    $ x_1^{(k+1)} = \operatorname*{argmin}\limits_{x_1}L_\rho(x_1, \xi_2^{(k)}, y_1^{(k)}), $

    resulting in:

    $ x_1^{(k+1)} = \left(F^TF+\rho E^TE\right)^{-1}\left[E^T\left(\rho \xi_2^{(k)}-\rho e_2-y_1^{(k)}\right)\right]. $

    $ (2) $ $ \xi_2 $-update: Solve for $ \xi_2^{(k+1)} $:

    $ \xi_2^{(k+1)} = \operatorname*{argmin}\limits_{\xi_2}L_\rho(x_1^{(k+1)}, \xi_2, y_1^{(k)}), $

    which simplifies to the soft-thresholding operation:

    $ \xi_2^{(k+1)} = S_{\tau_1/\rho}\left(Ex_1^{(k+1)}+e_2+\dfrac{y_1^{(k)}}{\rho}\right), $

    where $ S_{\tau_1/\rho}(\cdot) $ is the soft-thresholding operator.

    $ (3) $ Dual variable update: Update the dual variable $ y_1 $:

    $ y_1^{(k+1)} = y_1^{(k)}+\rho\left(Ex_1^{(k+1)}-\xi_2^{(k+1)}+e_2\right). $

    The iterative steps for solving the problem using ADMM are summarized in Algorithm 1:

    Algorithm 1 ADMM for solving Lasso LSTSVM 1st plane.
    % initialize $ \xi_2, y_1 $
    $ \xi_2^{(0)} \leftarrow \bar{\xi_2} $
    $ y_1^{(0)} \leftarrow \bar{y_1} $
    for $ k = 0, 1, 2, \ldots $ do
      $ x_1^{(k+1)} = \left(F^TF+\rho E^TE\right)^{-1}\left[E^T\left(\rho \xi_2^{(k)}-\rho e_2-y_1^{(k)}\right)\right] $
      $ \xi_2^{(k+1)} = S_{\tau_1/\rho}\left(Ex_1^{(k+1)}+e_2+\dfrac{y_1^{(k)}}{\rho}\right) $
      $ y_1^{(k+1)} = y_1^{(k)}+\rho\left(Ex_1^{(k+1)}-\xi_2^{(k+1)}+e_2\right) $
    end for

    Similarly, the second reformulated problem can be solved using the same concepts as the first problem, as previously demonstrated. The iterative steps for updating $ x_2, \xi_1, $ and $ y_2 $ are summarized in Algorithm 2:

    Algorithm 2 ADMM for solving Lasso LSTSVM 2nd plane.
    % initialize $ \xi_1, y_2 $
    $ \xi_1^{(0)} \leftarrow \bar{\xi_1} $
    $ y_2^{(0)} \leftarrow \bar{y_2} $
    for $ k = 0, 1, 2, \ldots $ do
      $ x_2^{(k+1)} = \left(E^TE+\rho F^TF\right)^{-1}\left[-F^T\left(\rho \xi_1^{(k)}-\rho e_1+y_2^{(k)}\right)\right] $
      $ \xi_1^{(k+1)} = S_{\tau_2/\rho}\left(-Fx_2^{(k+1)}+e_1-\dfrac{y_2^{(k)}}{\rho}\right) $
      $ y_2^{(k+1)} = y_2^{(k)}+\rho\left(Fx_2^{(k+1)}+\xi_1^{(k+1)}-e_1\right) $
    end for

    Nonlinear case. In real-world scenarios, the linear kernel method is not always applicable, as large-scale datasets often exhibit higher complexity. Therefore, we extend the algorithm to nonlinear Lasso LSTSVM using kernel techniques [14]. We modify the optimization problems (3.1) and (3.2) as follows:

    $ minw1,b1,ξ212K(A,X)˜w1+e1b122+c12ξ21s.t.(K(B,X)˜w1+e2b1)+ξ2=e2, $ (3.6)

    and

    $ minw2,b2,ξ112K(B,X)˜w2+e2b222+c22ξ11s.t.(K(A,X)˜w2+e1b2)+ξ1=e1. $ (3.7)

    Where

    $ K(\textbf{x}, \textbf{y}) = \phi(\textbf{x})^T\phi(\textbf{y}) $

    is a kernel for any $ \textbf{x} $ and $ \textbf{y} $, where function $ \phi $ maps data points $ \textbf{x} $ from $ \mathbb{R}^n $ to $ \mathbb{R}^m (n < m). $

    The model can be reformulated as the following:

    $ minx1,ξ212Gx122+τ1ξ21s.t.Hx1+ξ2=e2, $ (3.8)

    and

    $ minx2,ξ112Hx222+τ2ξ11s.tGx2+ξ1=e1, $ (3.9)

    where

    $ G = [K(A,X)e1], \ \ \ H = [K(B,X)e2], \ \ \ x_1 = [˜w1b1]^T, $

    and

    $ x_2 = [˜w2b2]^T. $

    The solution is similar to the linear kernel case: construct the augmented Lagrange function and follow the ADMM framework to update $ x, \xi $ and dual variables $ y. $

    As for complexity, solving the linear system contributes $ O(n^3) $. Our proximal operator is simple. Its complexity is $ O(n) $. If $ k $ iterations are required, the total complexity is $ O(k \cdot (n^3+n)) $. For convex problems, $ k $ scales as $ O(1/\epsilon) $, so the total complexity becomes $ O\left(\frac{n^3}{\epsilon}\right). $ The criteria that ensures that our ADMM converges are the feasibility of the primal and dual. It is important to have feasibility in both primal and the dual (Lagrangian variables).

    Accelerated Lasso LSTSVM with guard condition. In this work, we employ an accelerated ADMM framework [15] designed to ensure a consistent reduction in the combined residual $ \gamma $ by introducing a guard condition. This condition regulates the acceleration process, allowing it to proceed when met and reverting to the standard ADMM iteration otherwise. This adaptive mechanism prevents potential instability or divergence that might arise from direct acceleration.

    Let $ u $ denote the target vector we aim to compute. The approximation of $ u $ at the $ k- $th iteration is represented by $ u^{(k)}. $ For clarity, $ \bar{u}^{(k)} $ represents the approximation of $ u $ computed using the standard ADMM method, while $ \hat{u}^{(k)} $ denotes the vector derived from an acceleration strategy applied to the ADMM process. In an acceleration step, the next iteration $ \hat{u}^{(k+1)} $ is calculated by combining current and previous approximations of $ \bar{u}, $ often as follows:

    $ ˆu(k)=ˉu(k)+α(k)(ˉu(k)ˉu(k1)), $ (3.10)

    where $ \alpha^{(k)} $ is an adaptive parameter governing the acceleration. The proposed accelerated ADMM method, incorporating the guard condition, is formally outlined in Algorithm 3.

    Algorithm 3 Accelerated Lasso LSTSVM with guard condition.
    % initialize $ \xi_2, y_1 $
    $ \xi_2^{(0)} \leftarrow \tilde{\xi_2} $
    $ y_1^{(0)} \leftarrow \tilde{y_1} $
    for $ k = 0, 1, 2, \ldots $ do
      $ x_1^{(k+1)} = \left(F^TF+\rho E^TE\right)^{-1}\left[E^T\left(\rho \xi_2^{(k)}-\rho e_2-y_1^{(k)}\right)\right]; $
      $ \bar{\xi}_2^{(k+1)} = S_{\tau_1/\rho}\left(Ex_1^{(k+1)}+e_2+\dfrac{y_1^{(k)}}{\rho}\right) $
      $ \bar{y}_1^{(k+1)} = y_1^{(k)}+\rho\left(Ex_1^{(k+1)}-\bar{\xi}_2^{(k+1)}+e_2\right) $
      % Begin of acceleration steps
      $ \hat{\xi}_2^{(k+1)} = \bar{\xi}_2^{(k+1)} + \alpha_1^{(k)}\left(\bar{\xi}_2^{(k+1)}-\bar{\xi}_2^{(k)}\right) $
      $ \hat{y}_1^{(k+1)} = \bar{y}_1^{(k+1)} + \alpha_1^{(k)}\left(\bar{y}_1^{(k+1)}-\bar{y}_1^{(k)}\right) $
      % End of acceleration steps
      $ \gamma^{(k+1)} = \rho^{-1}\left\lVert{\hat{y}_1^{(k+1)}-\hat{y}_1^{(k)}}\right\rVert_2^2+\rho\left\lVert{\hat{\xi}_2^{(k+1)}-\hat{\xi}_2^{(k)}}\right\rVert_2^2 $
      % Begin of guard condition
      if $ \gamma^{(k+1)} < \gamma^{(0)} \eta^{k+1} $ then
        $ \xi_2^{(k+1)} = \hat{\xi}_2^{(k+1)} $
        $ y_1^{(k+1)} = \hat{y}_1^{(k+1)} $
      else
        $ \xi_2^{(k+1)} = \bar{\xi}_2^{(k+1)} $
        $ y_1^{(k+1)} = \bar{y}_1^{(k+1)} $
        $ \gamma^{(k+1)} = \rho^{-1} \left\lVert{y_1^{(k+1)} - y_1^{(k)}}\right\rVert_2^2 + \rho\left\lVert{\xi_2^{(k+1)}-\xi_2^{(k)}}\right\rVert_2^2 $
      end if
      % End of guard condition
    end for

    In our algorithm, the guard condition is designed to monitor the progress of the accelerated ADMM updates and decide whether to accept the acceleration or revert to standard ADMM updates. To make this decision robust and effective, we adopt the parameter selection strategy proposed in [15]. Specifically, the threshold parameter $ \eta \in (0, 1) $ is used to determine whether the accelerated update yields sufficient improvement in the combined residual. If the reduction is less than a factor of $ \eta, $ the algorithm rejects the acceleration step and reverts to the previous iterate.

    In this work, we set $ \eta = 0.85, $ following the empirical recommendation in [15]. This choice reflects a balanced trade-off between acceleration and stability. Additionally, the momentum parameter $ \alpha^{(k)} $ is used to control the acceleration step size. Following the stationary acceleration approach proposed in [15], we fix

    $ \alpha^{(k)} = \alpha $

    for all iterations, rather than using a dynamically updated rule. This simplification reduces computational overhead and avoids potential oscillations caused by increasing momentum.

    In particular, [15] provides a convergence proof under the condition that

    $ \alpha < 1/3, $

    ensuring the stability of the accelerated scheme. Based on this, we select a fixed value of $ \alpha $ within this bound, e.g.,

    $ \alpha = 0.2 $

    to maintain theoretical convergence guarantees while benefiting from the improved convergence speed that acceleration offers.

    This approach achieves faster convergence by leveraging the vectors computed from standard ADMM iterations as a reference for acceleration steps. Notably, selecting appropriate parameters for the guard condition is crucial to optimizing the method's overall efficiency.

    This section presents a comparative study evaluating the effectiveness of the accelerated Lasso LSTSVM using ADMM with both linear and nonlinear kernels. We assess the classification accuracy and computational efficiency of the standard LSTSVM, Lasso LSTSVM, and its accelerated variant.

    For experiments with linear kernels, we utilize datasets from the UCI machine learning repository [4], including ionosphere, breast cancer, Pima Indians, dry bean, satellite, predict students' dropout and academic success (PSDA), and QSAR biodegradation. For nonlinear classification, we employ EEG eye state and magic telescope (also from the UCI repository), along with the moon dataset—a synthetic dataset from Kaggle [16]—and the electricity dataset [17], a real-world dataset obtained from OpenML.

    All datasets are designed for binary or multi-class classification tasks and span diverse domains such as medicine, physics, and social sciences. Their varying characteristics in terms of sample size, feature dimension, and complexity provide a robust foundation for evaluating the generalization performance of the proposed models across different scenarios.

    As illustrated in Figure 1, a synthetic dataset was constructed to evaluate model performance on linearly separable data. The dataset consists of 500 samples with 2 features and 2 classes. Orange points represent Class 1, while blue points represent Class 2. To simulate challenging conditions, we introduced outliers by mislabeling 10% of the data—an intentionally high proportion, considering that outliers typically account for less than 5% of real-world datasets.

    Figure 1.  A synthetic data set with outlier which led to misclassification.

    We then evaluated the models by splitting the dataset into 80% training and 20% testing data. The results show that Lasso LSTSVM achieved an accuracy of 94%, compared to 92% from the standard LSTSVM. This highlights the Lasso model's robustness to label noise and its ability to generalize better in the presence of significant outlier contamination.

    The classification results are influenced by the selection of penalty parameters, which are set to specific values for linear and nonlinear kernels to ensure a fair comparison. For nonlinear classifiers, the Gaussian RBF kernel is utilized, with the kernel function defined as

    $ K(x_i, x_j) = \text{exp}(-\left\lVert{x_i-x_j}\right\rVert/2\sigma^2), $

    with the kernel parameter $ \sigma = 1 $ due to its well-established capability to model complex, nonlinear relationships in high-dimensional feature spaces. It is a widely used choice in kernel-based learning methods because it introduces locality and smoothness in decision boundaries, which is beneficial for datasets with intricate structures, such as EEG eye state and magic telescope.

    In our experiments, we fixed the kernel parameter to $ \sigma = 1 $ for consistency and to avoid additional hyperparameter tuning that could overshadow the performance comparison between models. However, we acknowledge that the performance of RBF-based models is sensitive to the choice of $ \sigma. $ A very small $ \sigma $ may lead to overfitting by capturing noise, while a large $ \sigma $ can oversmooth decision boundaries and underfit the data. The experiment results are summarized in Tables 14, where $ m $ and $ n $ indicate the number of training samples and feature dimensions, respectively.

    Table 1.  Accuracy $ \pm $ std (%) comparisons of two algorithms using linear kernel.
    Dataset $ (m\times n) $ Standard LSTSVM Lasso LSTSVM
    Ionosphere ($351 \times 33$) $80.00\pm4.67$ $90.91\pm5.70$
    Breast cancer ($558 \times 5$) $92.47\pm2.04$ $91.58\pm2.55$
    Pima indians ($768 \times 8$) $76.71\pm6.27$ $85.53\pm6.27$
    Dry Bean ($5573 \times 16$) $97.78\pm0.53$ $97.76\pm0.68$
    Satellite ($5100 \times 36$) $99.27\pm0.32$ $99.18\pm0.34$
    PSDA ($3630 \times 36$) $90.77\pm1.50$ $91.19\pm1.24$
    QSAR ($1055 \times 42$) $86.13\pm4.69$ $87.94\pm3.05$

     | Show Table
    DownLoad: CSV
    Table 2.  Accuracy $ \pm $ std (%) comparisons of two algorithms using Gaussian kernel.
    Dataset $ (m\times n) $ Standard LSTSVM Lasso LSTSVM
    Ionosphere ($351 \times 33$) $80.00\pm4.67$ $90.91\pm5.70$
    Breast cancer ($558 \times 5$) $92.47\pm2.04$ $91.58\pm2.55$
    Pima indians ($768 \times 8$) $76.71\pm6.27$ $85.53\pm6.27$
    Dry Bean ($5573 \times 16$) $97.78\pm0.53$ $97.76\pm0.68$

     | Show Table
    DownLoad: CSV
    Table 3.  Performance comparisons of two algorithms using linear kernel.
    Dataset $(m\times n)$ Lasso LSTSVM Accelerated Lasso LSTSVM
    Acc + std(%) time (Sec.) Acc + std(%) time (Sec.)
    Ionosphere ($351 \times 33$) $90.91\pm5.70$
    $1.12$
    $85.71\pm5.33$
    $0.84$
    Breast cancer ($558 \times 5$) $91.58\pm2.55$
    $4.32$
    $91.22\pm2.46$
    $4.50$
    Pima indians ($768 \times 8$) $85.53\pm6.27$
    $4.93$
    $85.53\pm6.26$
    $4.42$
    Dry Bean ($5573 \times 16$) $97.76\pm0.68$
    $10.49$
    $97.54\pm0.54$
    $6.80$
    Satellite ($5100 \times 36$) $99.18\pm0.34$
    $37.49$
    $99.31\pm0.35$
    $7.25$
    PSDA ($3630 \times 36$) $91.19\pm1.24$
    $9.83$
    $91.74\pm1.44$
    $3.95$
    QSAR ($1055 \times 42$) $87.94\pm3.05$
    $4.37$
    $86.96\pm3.93$
    $2.50$

     | Show Table
    DownLoad: CSV
    Table 4.  Performance comparisons of two algorithms using Gaussian kernel.
    Dataset $(m\times n)$ Lasso LSTSVM Accelerated Lasso LSTSVM
    Acc + std(%) time (Sec.) Acc + std(%) time (Sec.)
    Moon ($200\times 3$) $95.50\pm6.43$
    $1.91$
    $96.00\pm6.58$
    $1.82$
    EEG eye state ($14979 \times 14$) $85.10\pm0.76$
    $141.26$
    $85.69\pm0.62$
    $117.74$
    Magic telescope ($19020 \times 10$) $77.08\pm0.62$
    $116.39$
    $77.18\pm0.75$
    $80.15$
    Electricity ($45312 \times 6$) $73.08\pm0.81$
    $65.76$
    $72.86\pm0.96$
    $84.96$

     | Show Table
    DownLoad: CSV

    The datasets used in this study exhibit varying degrees of outlier presence. Based on the interquartile range (IQR) method [18], most datasets were found to contain a moderate to high number of outliers. Notably, the Pima Indians and QSAR datasets have a particularly high concentration of outliers. For Pima Indians, this is primarily due to missing or implausible values treated as real numbers—features such as insulin, BMI, skin thickness, and blood pressure often contain zeros, which are physiologically impossible and act as placeholders for missing data but are not properly handled in the raw dataset. The QSAR dataset, on the other hand, naturally includes outliers due to its chemical diversity, high dimensionality, and non-normal feature distributions.

    Table 1 compares the classification accuracy of the standard LSTSVM and the Lasso LSTSVM using a 10-fold cross-validation approach. Results show that Lasso LSTSVM outperforms the standard LSTSVM on datasets such as ionosphere, Pima Indians, PSDA, and QSAR, all of which contain relatively high levels of outliers. In contrast, for datasets like breast cancer, dry bean, and satellite, which exhibit fewer outliers, the performance gap between the models is narrower.

    As shown in Table 2, aside from the moon dataset, which is synthetic and generated from Kaggle, the remaining three datasets (EEG eye state, magic telescope, and electricity) contain a considerable number of outliers. Interestingly, the standard LSTSVM performs better than the Lasso LSTSVM on the Moon dataset, while Lasso LSTSVM shows higher accuracy for EEG eye state and magic telescope. However, there are cases—such as the electricity dataset—where the standard LSTSVM still performs better despite the presence of outliers. This can be attributed to the nature of the outliers in this dataset, which result from natural fluctuations rather than sensor errors or data entry mistakes. In such cases, the $ L_2 $-norm of the standard LSTSVM may better capture the underlying data structure, whereas the $ L_1 $-norm regularization of Lasso LSTSVM might overly penalize these variations, potentially missing broader trends.

    Tables 3 and 4 compare the computational time between the Lasso LSTSVM and the Accelerated Lasso LSTSVM models. The results clearly demonstrate that, for both linear and Gaussian RBF kernels, the accelerated Lasso LSTSVM significantly reduces computation time while maintaining classification accuracy comparable to that of the standard Lasso LSTSVM. This highlights the effectiveness of the proposed acceleration strategy, particularly for large-scale or high-dimensional datasets where computational efficiency is critical.

    It is widely understood that in regression, ridge regression is used when many predictors or independent variables may be relevant, particularly in multicollinear contexts. In contrast, Lasso regression is employed when we suspect that only a few predictors are significant, allowing for effective feature selection. However, ridge regularization is more sensitive to outliers because it minimizes squared errors, which can lead to biased coefficients that are heavily influenced by those outliers. This principle applies to both the standard LSTSVM models and the Lasso LSTSVM models. Consequently, existing LSTSVM models are likely more sensitive to outliers than the Lasso LSTSVM models.

    However, in some instances, the standard LSTSVM models outperformed the Lasso LSTSVM. This may be due to the standard LSTSVM's ability to handle a larger number of relevant predictors or independent variables more effectively. The standard LSTSVM shrinks the coefficients of irrelevant features without eliminating any coefficients entirely. As a result, if the dataset contains many irrelevant features, using the standard LSTSVM may lead to a complex model with numerous included features since none of the irrelevant coefficients are reduced to zero. In contrast, the standard LSTSVM might perform better than the Lasso LSTSVM when the dataset includes relevant independent variables, as shown in some of ours examples.

    This research presents a comparative analysis between three classification models: the standard LSTSVM, the Lasso LSTSVM, and the proposed accelerated Lasso LSTSVM, using the ADMM framework. The study compares the accuracy of the standard LSTSVM with Lasso LSTSVM and compares the computational time between Lasso LSTSVM and Accelerated Lasso LSTSVM. The inclusion of $ L1 $-norm regularization helps make the model more robust to outliers, enabling it to adapt well to datasets with a high number of outliers. Additionally, the acceleration step in the ADMM process helps reduce computation time without compromising classification accuracy.

    Experimental results on several benchmark datasets, both linear and nonlinear, show that the proposed model outperforms the standard LSTSVM in terms of accuracy. Although there are cases where the standard LSTSVM achieves better accuracy than the Lasso type, this can be attributed to various factors revealed through analysis. However, there remains potential for improvement in computational efficiency, particularly for large-scale datasets. The accelerated model effectively reduces the number of iterations and training time compared to the non-accelerated version. Furthermore, the guard condition applied with the acceleration step ensures the algorithm's stability and guarantees reliable results. This study demonstrates that combining robustness to outliers with ADMM tuning and acceleration steps produces an efficient model that is well-suited for real-world data applications. Future research could extend this work by developing adaptive acceleration techniques and investigating theoretical convergence guarantees to achieve even faster and more reliable algorithms for practical use.

    Thanasak Mouktonglang: conceptualization, methodology, validation, formal analysis, investigation, data curation, writing review and editing, supervision, funding acquisition; Rujira Fongmun: methodology, software, validation, investigation, data curation, writing original draft, writing review and editing. All authors have read and agreed to the published version of the manuscript.

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    This research was partially supported by the Chiang Mai University, Faculty of Science, Chiang Mai University. The first author was supported by the Development and Promotion of Science and Technology Talents Project (DPST) scholarship.

    The authors declare no conflicts of interest.


    Abbreviation circRNA: circular RNA; DFS: disease-free survival; HR: hazard ratio; CI: confidence interval; OS: overall survival;

    Conflict of interest



    The authors declare no conflict of interest.

    [1] Ferlay J, Colombet M, Soerjomataram I, et al. (2019) Estimating the global cancer incidence and mortality in 2018: GLOBOCAN sources and methods. Int J Cancer 144: 1941–1953.
    [2] Zhang B, Xie S, Yu I (2018) Differential incidence trends of colon and rectal cancers in Hong Kong: An age-period-cohort analysis. Cancer Commun 38: 42. doi: 10.1186/s40880-018-0311-2
    [3] Kim NK, Kim YW, Han YD, et al. (2016) Complete mesocolic excision and central vascular ligation for colon cancer: Principle, anatomy, surgical technique, and outcomes. Surg Oncol 25: 252–262. doi: 10.1016/j.suronc.2016.05.009
    [4] Schmoll HJ, Van Cutsem E, Stein A, et al. (2012) ESMO Consensus Guidelines for management of patients with colon and rectal cancer. A personalized approach to clinical decision making. Ann Oncol 23: 2479–2516.
    [5] Chen T, Huang Y, Wang G (2017) Outcome of colon cancer initially presenting as colon perforation and obstruction. World J Surg Oncol 15: 164. doi: 10.1186/s12957-017-1228-y
    [6] Peng J, Zhang R, Zhao Y, et al. (2017) Prognostic value of preoperative prognostic nutritional index and its associations with systemic inflammatory response markers in patients with stage III colon cancer. Chin J Cancer. 36: 96. doi: 10.1186/s40880-017-0260-1
    [7] Dienstmann R, Salazar R, Tabernero J (2015) Personalizing colon cancer adjuvant therapy: Selecting optimal treatments for individual patients. J Clin Oncol 33: 1787–1796. doi: 10.1200/JCO.2014.60.0213
    [8] Yan L, Zhang W (2018) Precision medicine becomes reality-tumor type-agnostic therapy. Cancer Commun 38: 6. doi: 10.1186/s40880-018-0274-3
    [9] Wilusz JE, Sharp PA (2013) Molecular biology. A circuitous route to noncoding RNA. Science 340: 440–441.
    [10] Jeck WR, Sorrentino JA, Wang K, et al. (2013) Circular RNAs are abundant, conserved, and associated with ALU repeats. Cold Spring Harbor Lab Press 19: 141–157.
    [11] Fang L, Du W, Lyu J, et al. (2018) Enhanced breast cancer progression by mutant p53 is inhibited by the circular RNA circ-Ccnb1. Cell Death Differ 25: 2195–2208. doi: 10.1038/s41418-018-0115-6
    [12] Yang Q, Du WW, Wu N, et al. (2017) A circular RNA promotes tumorigenesis by inducing c-myc nuclear translocation. Cell Death Differ 24: 1609–1620. doi: 10.1038/cdd.2017.86
    [13] Zhang M, Zhao K, Xu X, et al. (2018) A peptide encoded by circular form of LINC-PINT suppresses oncogenic transcriptional elongation in glioblastoma. Nature Commun 9: 4475. doi: 10.1038/s41467-018-06862-2
    [14] Meng S, Zhou H, Feng Z, et al. (2017) CircRNA: Functions and properties of a novel potential biomarker for cancer. Mol Cancer 16: 94. doi: 10.1186/s12943-017-0663-2
    [15] Ju H, Zhao Q, Wang F, et al. (2019) A circRNA signature predicts postoperative recurrence in stage II/III colon cancer. EMBO Mol Med 2019: e10168.
  • Reader Comments
  • © 2019 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(4220) PDF downloads(579) Cited by(1)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog