Processing math: 100%
Research article

The maximum residual block Kaczmarz algorithm based on feature selection

  • Received: 03 January 2025 Revised: 26 February 2025 Accepted: 07 March 2025 Published: 20 March 2025
  • MSC : 15A06, 65F10, 65F20

  • Based on the K-means method, an effective block-row partitions algorithm was proposed in [1], which involves partitioning the rows of the coefficient matrix ARm×n. However, with the increase of the size of the coefficient matrix, the time required for the partitioning process will increase significantly. To address this problem, we considered selecting features from the columns of the matrix A to obtain a low-rank matrix ˜ARm×d(dn). Lasso is a regression analysis method for feature selection, which is simple and has excellent processing ability for high-dimensional data. In view of this, we first introduced a new criterion for selecting the projection block, and proposed the maximum residual block Kaczmarz algorithm. Then, we put forward the feature selection algorithm based on Lasso, and further presented a maximum residual block Kaczmarz algorithm based on feature selection. We analyzed the convergence of these algorithms and demonstrated their effectiveness through numerical results, while also verifying the performance of the proposed algorithms in image reconstruction.

    Citation: Ran-Ran Li, Hao Liu. The maximum residual block Kaczmarz algorithm based on feature selection[J]. AIMS Mathematics, 2025, 10(3): 6270-6290. doi: 10.3934/math.2025286

    Related Papers:

    [1] Wen-Ning Sun, Mei Qin . On maximum residual block Kaczmarz method for solving large consistent linear systems. AIMS Mathematics, 2024, 9(12): 33843-33860. doi: 10.3934/math.20241614
    [2] Ke Zhang, Hong-Yan Yin, Xiang-Long Jiang . An efficient variant of the greedy block Kaczmarz algorithm for solving large linear systems. AIMS Mathematics, 2024, 9(1): 2473-2499. doi: 10.3934/math.2024122
    [3] Feng Yin, Bu-Yue Zhang, Guang-Xin Huang . A partially block randomized extended Kaczmarz method for solving large overdetermined inconsistent linear systems. AIMS Mathematics, 2023, 8(8): 18512-18527. doi: 10.3934/math.2023941
    [4] Li Wen, Feng Yin, Yimou Liao, Guangxin Huang . A greedy average block Kaczmarz method for the large scaled consistent system of linear equations. AIMS Mathematics, 2022, 7(4): 6792-6806. doi: 10.3934/math.2022378
    [5] Yunda Dong, Yiyi Li . A new operator splitting method with application to feature selection. AIMS Mathematics, 2025, 10(5): 10740-10763. doi: 10.3934/math.2025488
    [6] Ahmad Y. Al-Dweik, Ryad Ghanam, Gerard Thompson, M. T. Mustafa . Algorithms for simultaneous block triangularization and block diagonalization of sets of matrices. AIMS Mathematics, 2023, 8(8): 19757-19772. doi: 10.3934/math.20231007
    [7] Shu-Xin Miao, Jing Zhang . On Uzawa-SSI method for non-Hermitian saddle point problems. AIMS Mathematics, 2020, 5(6): 7301-7315. doi: 10.3934/math.2020467
    [8] Ziqiang Wang, Qin Liu, Junying Cao . A higher-order numerical scheme for system of two-dimensional nonlinear fractional Volterra integral equations with uniform accuracy. AIMS Mathematics, 2023, 8(6): 13096-13122. doi: 10.3934/math.2023661
    [9] Cui-Xia Li, Long-Quan Yong . Modified BAS iteration method for absolute value equation. AIMS Mathematics, 2022, 7(1): 606-616. doi: 10.3934/math.2022038
    [10] Ayed. R. A. Alanzi, M. Qaisar Rafique, M. H. Tahir, Farrukh Jamal, M. Adnan Hussain, Waqas Sami . A novel Muth generalized family of distributions: Properties and applications to quality control. AIMS Mathematics, 2023, 8(3): 6559-6580. doi: 10.3934/math.2023331
  • Based on the K-means method, an effective block-row partitions algorithm was proposed in [1], which involves partitioning the rows of the coefficient matrix ARm×n. However, with the increase of the size of the coefficient matrix, the time required for the partitioning process will increase significantly. To address this problem, we considered selecting features from the columns of the matrix A to obtain a low-rank matrix ˜ARm×d(dn). Lasso is a regression analysis method for feature selection, which is simple and has excellent processing ability for high-dimensional data. In view of this, we first introduced a new criterion for selecting the projection block, and proposed the maximum residual block Kaczmarz algorithm. Then, we put forward the feature selection algorithm based on Lasso, and further presented a maximum residual block Kaczmarz algorithm based on feature selection. We analyzed the convergence of these algorithms and demonstrated their effectiveness through numerical results, while also verifying the performance of the proposed algorithms in image reconstruction.



    For decades, research on solving large linear systems Ax=b with ARm×n and bRm has maintained an abundant and enduring history. Among the various solving approaches, iterative methods stand out due to their outstanding performance in efficiency and stability [2,3,4], particularly in dealing with large linear systems. The Kaczmarz method [5] is a typical representative, which has been widely used in many fields, such as computerized tomography, signal processing, and image reconstruction [6,7,8,9].

    A randomized version of the Kaczmarz method was proposed in [10], known as the randomized Kaczmarz (RK) method, which randomly selects rows of the coefficient matrix instead of selecting them in a cyclic manner. It has been proven that this method possesses an expected linear convergence rate. In [11], a better upper bound on the convergence rate of the RK method was provided. To enhance the convergence speed, Bai and Wu presented an effective probability criterion for selecting the larger entries in the residual vector as much as possible at each iteration, based on which the greedy randomized Kaczmarz (GRK) method [12] was proposed. Additionally, they demonstrated that its convergence rate is significantly faster than that of the RK method. Based on the GRK method, the relaxed GRK method was introduced in [13], and a non-randomized greedy Kaczmarz method was presented in [14]. For more discussions on the RK method, see also [15,16,17,18] and the references therein. References [19] and [20] provided a recent comprehensive investigation and review of Kaczmarz-type methods. In addition, the RK method is applied to solve the tensor equations, as detailed in references [21,22,23].

    In many calculation processes, the block Kaczmarz method [24] can be implemented more efficiently than the Kaczmarz method, which uses some rows of the coefficient matrix at each iteration step. In order to improve the convergence speed, Needell and Tropp presented the randomized block Kaczmarz method [25], which has an expected linear rate of convergence. Afterwards, Necoara developed the randomized average block Kaczmarz method [26], which can be deployed on distributed computing units. Miao and Wu proposed the greedy randomized average block Kaczmarz [27] method, which can be implemented in a distributed environment and also analyzed two kinds of extrapolated step sizes. For the block version of the RK method, see also [28,29,30,31] and the references therein. A partition strategy based on the K-means method was proposed in [1,32], which used cosine distance to measure the similarity between row vectors of the coefficient matrix. Due to the fact that cosine distance measures both the distance and directional similarity between two vectors, this strategy often yields better partitioning results. However, when the size of the coefficient matrix is too large or even huge, the time required for partitioning will significantly increase. Therefore, we consider utilizing a feature selection method to obtain a low-rank matrix, thereby improving solving efficiency.

    Lasso (least absolute shrinkage and selection operator) [33] is a regression analysis method that performs both variable selection and parameter estimation by shrinking coefficients with smaller absolute values to zero through the penalty term. Based on the idea of Lasso, we consider selecting features from the columns of coefficient matrix ARm×n and obtain the low-rank matrix ˜ARm×d(dn). Therefore, our objective is to solve the following problem:

    minβ{12bAβ22+λβ1}, (1.1)

    where ARm×n, bRm, βRn, and λ>0. By adjusting the value of λ, the sparsity of the vector β can be controlled, where non-zero entries correspond to the positions of selected features from coefficient matrix A. Let β(j) represent the j-th entry of β, and construct the index set of non-zero entries in β, denoted by

    D={˜j|β(j)0,j=1,2,,n}, (1.2)

    and the number of elements in D is d. The low-rank matrix ˜ARm×d can be obtained, with its column indices given by D. Then, we partition the row vectors of matrix ˜A instead of matrix A, which can improve the efficiency of the partition process. This process of feature selection and partitioning is shown in Figure 1.

    Figure 1.  The process of feature selection and partitioning.

    By partitioning the row vectors of matrix ˜A, we obtain the block index set ˜J=[˜J1,,˜Jq]. This index set applies to the row vectors of the coefficient matrix A, which is divided into q blocks A˜J1,,A˜Jq. Compared to directly partitioning the row vectors of A, partitioning those of ˜A can significantly reduce the time required for the partitioning process.

    In this paper, we first introduce a new criterion for selecting the projection block, which involves this block containing the row with the maximum modulus of the residual vector, and propose the maximum residual block Kaczmarz algorithm. Then, we present the feature selection algorithm based on Lasso, and further put forward the maximum residual block Kaczmarz algorithm based on feature selection. The convergence of these algorithms is analyzed respectively, and numerical results demonstrate their effectiveness. In addition, we also verify the performance of both algorithms in image reconstruction.

    For a matrix A=(aij)Rm×n, we use A, A, A2, A(i), σmin(A), and σmax(A) to denote its transpose, Moore-Penrose pseudoinverse, spectral norm, i-th row, smallest nonzero value, and largest singular value, respectively. Let [m]:={1,2,,m}, J={J1,J2,,Jq}, and JvJs=,v,s{1,2,q}, qs=1Js=[m]. AJs and |Js| represent the row submatrix indexed by Js and the number of rows contained in block Js. When generating test images, N, θ, and p denote the number of pixels, the scanning angle, and the number of X-ray beams, respectively. The cosine distance [1] between the n-dimensional vectors x1 and x2 is expressed as dc(x1,x2).

    The soft-thresholding operator [34] is a commonly used feature selection method that allows for screening and compression of features based on their importance, particularly in solving the Lasso estimator. By setting feature values below a certain threshold to zero and filtering out features that have minimal or redundant impact on the target variable, the dimensionality of the data is effectively reduced. This approach reduces computational complexity and enhances the simplicity and effectiveness of the model. The soft-thresholding operator is defined as

    Sλ(x)=sign(x)(|x|λ)+,

    where xRn, t+ denotes the positive part of tR. It has a simple explicit expression, given by

    Sλ(xj)={xjλ,xj>λ,0,|xj|λ,xj+λ,xj<λ.

    K-means clustering is a powerful tool for exploratory data analysis and data compression, and it is particularly valuable for identifying natural groupings within the data. Based on the clustering idea of the K-means method, an efficient partitioning strategy was presented in [1]. The block-row partitions algorithm is as follows.

    Algorithm 2.1. The block-row partitions algorithm.
    Input: ARm×n and the number of the blocks q(q<m)
    Output: AJ1,AJ2,,AJq
    (1) Randomly select q rows of the matrix A as the initial block centers ˉx1,ˉx2,ˉxq, where ˉxs(s=1,,q)R1×n;
    (2) Repeat
    (3)      Assign each row A(i)(i=1,2,,m) to the block which has the nearest block AJl, i.e.,
    d(i)c(A(i),ˉxl)=mins(d(i)c(A(i),ˉxs))(s=1,2,,q);
    (4)      Update the average value ˉxs=A(i)AJsA(i)|AJs|, where |AJs| is the number of row vectors in the block Js;
    (5)      Compute the criterion function: C=qs=1xAJsdc(x,ˉxs)2;
    (6) Until C no longer changes, AJs is generated.

    Based on the block-row partitions algorithm, we can obtain q sub-matrices (referred to as blocks) of the coefficient matrix A, i.e., AJ1,AJ2,,AJq. Thereafter, the selection of the projection block becomes our main focus. Inspired by the GRK method, when solving large linear systems, those terms with large modulus of the residual vector are given priority for the iteration to achieve faster convergence. However, in the block iteration method, the components of the residual vector in each block can be uneven, so determining the appropriate total residual metric to represent the entire block is a complex task. Therefore, we introduce a criterion for selecting the projection block, which involves selecting the block containing the row with the maximum modulus of the residual vector.

    At the k-th iterate, the row index of the maximum modulus of the residual vector is denoted as hk, given by

    hk=argmax1im|b(i)A(i)xk|, (3.1)

    which satisfy hkJsk. Afterward, the current iterate xk is orthogonally projected onto the hyperplane defined by AJskxk=bJsk. For any initial vector x0, the maximum residual block Kaczmarz (MBK) algorithm can be formulated as

    xk+1=xk+AJsk(bJskAJskxk),k=0,1,2,, (3.2)

    where hkJsk is defined in (3.1). The maximum residual block Kaczmarz algorithm is given as follows.

    Algorithm 3.1. The maximum residual block Kaczmarz algorithm: MBK.
    Input: ARm×n, bRm, l, q(q<m), and x0
    Output: xl
    (1) The blocks AJs,bJs(s=1,2,,q) are obtained by Algorithm 2.1, and satisfy JvJs=,v,s{1,2,q} and qs=1Js=[m];
    (2) for k=0,1,,l1 do
    (3)      Compute rk=bAxk;
    (4)      Select the projection block AJsk satisfying
    hk=argmax1im|r(i)k|,hkJsk;
    (5)      Set xk+1=xk+AJsk(bJskAJskxk);
    (6) end for

    Theorem 3.1. Let the large linear system Ax=b, with the coefficient matrix ARm×n and the right-hand side bRm, be consistent. The iteration sequence {xk}k=0 generated by the MBK algorithm starting from any initial vector x0R(A) converges to the unique least-norm solution x=Ab. Moreover, the solution error for the iteration sequence {xk}k=0 obeys

    xkx22(1σ2min(A)ˆσ2(mˆNmin))kx0x22, (3.3)

    where ˆσ=maxJsk{σmax(AJsk)} and ˆNmin=minJsk1{|Jsk1|} (defining |Js1|=0).

    Proof. From the definition of the maximum residual block Kaczmarz algorithm, for k=0,1,2,, we have

    xk+1xk=AJskAJsk(xxk).

    Since AJskx=bJsk, we immediately get

    xk+1x=(IAJskAJsk)(xkx).

    Based on the properties of the Moore-Penrose pseudoinverse, we can obtain

    (xxk)(AJskAJsk)(IAJskAJsk)(xkx)=(xxk)(AJskAJskAJskAJskAJskAJsk)(xkx)=0,

    and it follows that

    xk+1x22=xkx22AJskAJsk(xxk)22. (3.4)

    From the definition of the MBK algorithm, again, we know that

    |b(hk)A(hk)xk|2=max1im|b(i)A(i)xk|2,hkJsk, (3.5)

    and

    bAxk22=uJsk1|b(u)A(u)xk|2+v[m]Jsk1|b(v)A(v)xk|2=bJsk1AJsk1(xk1+AJsk1(bJsk1AJsk1xk1))22+v[m]Jsk1|b(v)A(v)xk|2=AJsk1xAJsk1xk1AJsk1AJsk1AJsk1x+AJsk1AJsk1AJsk1xk122+v[m]Jsk1|b(v)A(v)xk|2=v[m]Jsk1|b(v)A(v)xk|2(m|Jsk1|)maxv[m]Jsk1{|b(v)A(v)xk|2}(m|Jsk1|)|b(hk)A(hk)xk|2, (3.6)

    where |Jsk1| represents the number of rows contained in block Jsk1 and [m]Jsk1 denotes the set difference between the set [m] and Jsk1. Combining (3.4), (3.5), and (3.6), it leads to the inequality

    xk+1x22=xkx22AJskAJsk(xxk)22xkx22σ2min(AJsk)ikJsk|A(ik)(xxk)|2xkx22σ2min(AJsk)maxikJsk{|b(ik)A(ik)xk|2}xkx22σ2min(AJsk)|b(hk)A(hk)xk|2xkx22σ2min(AJsk)m|Jsk1|bAxk22xkx22σ2min(A)σ2max(AJsk)(m|Jsk1|)xkx22,

    where we have used the following estimate:

    Az22σ2min(A)z22,zR(A).

    Therefore, for k=0,1,2,, we have

    xk+1x22(1σ2min(A)ˆσ2(mˆNmin))xkx22, (3.7)

    where ˆσ=maxJsk{σmax(AJsk)} and ˆNmin=minJsk1{|Jsk1|}. By induction on the iteration index k, the estimate (3.3) holds.

    The maximum residual block Kaczmarz algorithm introduces an effective criterion for determining the projection block, which contains the row with the maximum modulus of the residual vector. However, as the size of the coefficient matrix increases, the time required for the partitioning process will increase significantly. Therefore, by solving the problem (1.1), we can select features from the columns of the coefficient matrix ARm×n, and obtain the low-rank matrix ˜ARm×d(dn). Then, partitioning the row vectors of ˜A using Algorithm 2.1 can significantly reduce the required time.

    For solving the problem (1.1), the proximal gradient descent method [35] is a straightforward and efficient computational approach, which is a variant of the gradient descent method and is often referred to as the iterative soft-thresholding method [36]. It iteratively updates the solution by combining gradient descent with the proximal operator, effectively handling non-smooth and regularized optimization problems. The proximal operator is defined as follows:

    proxt(β)=argminz{12tβz22+λz1}=argminz{βz+2λtz1}=Sλt(β),

    where Sλt(β) is the soft-thresholding operator. Choosing any initial approximation β0 and t0, the proximal gradient update of problem (1.1) is

    βk+1=Sλtk(βktkg(βk)),k=0,1,2,,

    where g(βk)=A(bAβk), that is,

    βk+1=Sλtk(βk+tkA(bAβk)),k=0,1,2,, (3.8)

    where tk is the step size [36].

    For the value of the parameter λ, cross-validation can be used to select the optimal regularization parameter, but this requires an amount of computation. Our objective is not to obtain the optimal solution but to perform feature selection from the columns of the coefficient matrix A. An effective and practical approach is adopted where the penalty parameter λ is determined by rough estimation to ensure that the number of selected columns falls within a given range. This approach can help us quickly obtain feature selection results while maintaining a certain level of accuracy and reducing computational complexity. On this basis, the feature selection algorithm based on Lasso is as follows.

    Algorithm 3.2. The feature selection algorithm based on Lasso.
    Input: ARm×n, bRn, λstep, [λl,λr], [dl,dr], t0, β0, max_Iter
    Output: ˜ARm×d
    (1) for λ=λl:λstep:λr do
    (2)      for k=0,1,2,,maxIter do
    (3)          Update the step size tk+1=1+1+4t2k2;
    (4)          Compute βk+1=Sλ˜t(βk+tk1tk+1A(bAβk));
    (5)          if βk+1βk2<106 end if
    (6)      end for
    (7)      Determine the index set D={˜j|β(j)k0,j=1,2,,n}, where the number of elements in D is d;
    (8)      if dl<d<dr end if
    (9) end for
    (10) Construct the low-rank matrix ˜ARm×d, where the column index is denoted by D.

    Remark 3.1. At the k-th iterate, the constant step size is ˜t=tk1tk+1. The initial value is set to t0=1σ2max(A), ensuring that 0<˜t<1σ2max(A) always holds true.

    Many studies [35,36,37,38] have analyzed the convergence of the proximal gradient descent method. Below is the convergence analysis of Algorithm 3.2 for solving the problem (1.1).

    Theorem 3.2. We are given the coefficient matrix ARm×n and the right-hand side bRm. The iteration sequence {βk}k=0 generated by the feature selection algorithm based on Lasso starting from any initial approximation β0 converges to the optimal solution β. Moreover, the solution error for the iteration sequence {βk}k=0 obeys

    βk+1β2(1˜tσ2max(A))(βkβ)2, (3.9)

    where 0<˜t<1σ2max(A).

    Proof. From the definition of Algorithm 3.2, we know that

    βk+1=Sλ˜t(βk+˜tA(bAβk)),k=0,1,2,, (3.10)

    where ˜t=tk1tk+1 and Sλ˜t() is the soft-thresholding operator. For n-dimensional vectors β1 and β2, where β(j)1 and β(j)2 represent their j-th entry (j=1,2,,n), if they have the same sign, we have

    Sλ˜t(β1)Sλ˜t(β2)2={(β1λ˜t)(β2λ˜t)2,β(j)1>0,β(j)2>0,(β1+λ˜t)(β2+λ˜t)2,β(j)1<0,β(j)2<0,=β1β22.

    On the contrary, if β(j)1 and β(j)2 have opposite signs, we have

    Sλ˜t(β1)Sλ˜t(β2)2={(β1λ˜t)(β2+λ˜t)2,β(j)1>0,β(j)2<0,(β1+λ˜t)(β2λ˜t)2,β(j)1<0,β(j)2>0,={β1β22λ˜t2,β(j)1>0,β(j)2<0,β1β2+2λ˜t2,β(j)1<0,β(j)2>0,<β1β22.

    Thus, we can get the estimation

    Sλ˜t(β1)Sλ˜t(β2)2β1β22. (3.11)

    Using (3.10) and (3.11), we obtain

    βk+1β2=Sλ˜t(βk+˜tA(bAβk))Sλ˜t(β+˜tA(bAβ))2βk+˜tA(bAβk)(β+˜tA(bAβ))2(In˜tAA)(βkβ)2(1˜tσ2max(A))(βkβ)2,

    where 0<˜t<1σ2max(A).

    Based on the idea of Lasso, we accomplish feature selection from the columns of the coefficient matrix A by employing Algorithm 3.2, which involves obtaining a low-rank matrix ˜A. Subsequently, guided by the criterion outlined in (3.1) for selecting the projection block, we present the maximum residual block Kaczmarz (LMBK) algorithm based on feature selection. The specific implementation steps of LMBK are as follows.

    Algorithm 3.3. The maximum residual block Kaczmarz algorithm based on feature selection: LMBK.
    Input: ARm×n, bRn, [λl,λr], [dl,dr], t0, β0, max_Iter, l, q(q<m), and x0
    Output: xl
    (1) The low-rank matrix ˜ARm×d is obtained by Algorithm 3.2;
    (2) Partition the rows of the matrix ˜A by Algorithm 2.1 to obtain the block set ˜J=[˜J1,,˜Jq] satisfying ˜Jv˜Js= and qs=1˜Js=[m];
    (3) for k=0,1,,l1 do
    (4)          Compute rk=bAxk;
    (5)          Select the projection block A˜J˜sk satisfying
    ˜hk=argmax1im|r(i)k|,˜hk˜J˜sk;
    (6)          Set xk+1=xk+A˜J˜sk(b˜J˜skA˜J˜skxk);
    (7) end for

    Remark 3.2. In this algorithm, we first select features from the columns of the coefficient matrix A to obtain a low-rank matrix ˜A, and then the rows of matrix ˜A are partitioned using Algorithm 2.1, which is notably faster than Step 1 in Algorithm 3.1.

    Theorem 3.3. Let the large linear system Ax=b, with the coefficient matrix ACm×n and the right-hand side bCm, be consistent. The iteration sequence {xk}k=0 generated by the LMBK algorithm starting from any initial approximation x0R(A) converges to the unique least-norm solution x=Ab. Moreover, the solution error for the iteration sequence {xk}k=0 obeys

    xkx22(1σ2min(A)˜σ2(m˜Nmin))kx0x22, (3.12)

    where ˜σ=max˜J˜sk{σmax(A˜J˜sk)} and ˜Nmin=min˜J˜sk1{|˜J˜sk1|} (defining |˜J˜s1|=0).

    Proof. The difference between the MBK algorithm and the LMBK algorithm lies in the feature selection preprocessing of the coefficient matrix, which significantly reduces the time required for the partitioning process. For the convergence analysis of the LMBK algorithm, refer to the proof process of Theorem 3.1.

    In this section, we implement the greedy randomized Kaczmarz (GRK) method [12], the maximum residual block Kaczmarz (MBK) algorithm, and the maximum residual block Kaczmarz algorithm based on feature selection (LMBK), and demonstrate that the latter is numerically advantageous over the former in terms of the number of iteration steps (IT) and the computing time in seconds (CPU).

    The coefficient matrix ARm×n is obtained from three different sources. The first class of matrices is generated randomly by the MATLAB function randn, the second class is selected from the University of Florida Sparse Matrix Collection [39], and the third class is taken from problems related to image reconstruction. The right-hand side is b=Ax, where xRn is generated by the function randn. Note that b is noise-free in our experiments. Let max_Iter = 100, the initial vector x0=0, β0=0, and the relative residual (rres) is defined by

    rres=bAxk22b22.

    The iterations are stopped when xk satisfies rres104 or CPU exceeds 10,000 seconds.

    All experiments are performed using MATLAB (version R2021a) on a personal computer with 3.10 GHz central processing unit (Intel(R) Core (TM) i5-11300H), 16 GB memory, and a Windows 11 operating system.

    Example 4.1. The coefficient matrix ARm×n is randomly generated by using the MATLAB function randn. Let [λl:λstep:λr]=[1:0.5:10], [dl,dr]=[2,100], and the numerical results of GRK, MBK, and LMBK are as follows. Note that q is used for MBK and LMBK, whereas d is used for LMBK only.

    From Tables 1 and 2, it is observed that the LMBK algorithm outperforms both the GRK and MBK algorithms in terms of IT and CPU. Furthermore, as the size of the coefficient matrix A increases, the GRK method fails to converge within the specified time, whereas the two proposed algorithms in this paper perform quite well. To visualize the performance of these three algorithms, we plot the convergence curves based on the aforementioned numerical results as follows.

    Table 1.  IT and CPU of GRK, MBK, and LMBK for matrix A with n=3000 and different m.
    m×n q d GRK MBK LMBK
    IT CPU IT CPU IT CPU
    20,000×3000 10 7 4154 64.1042 9 11.9749 9 3.2472
    50,000×3000 20 9 3033 117.6046 6 30.0806 5 5.7335
    80,000×3000 40 18 9 60.7425 9 12.7522
    100,000×3000 60 17 12 77.6643 12 18.8722
    200,000×3000 80 11 6 304.5365 6 54.6815

     | Show Table
    DownLoad: CSV
    Table 2.  IT and CPU of GRK, MBK, and LMBK for matrix A with m=80,000 and different n.
    m×n q d GRK MBK LMBK
    IT CPU IT CPU IT CPU
    m×n q d GRK MBK LMBK
    IT CPU IT CPU IT CPU
    80,000×2000 40 12 1710 123.3912 1 35.2884 1 6.4711
    80,000×4000 40 22 14 87.8377 14 24.3464
    80,000×6000 40 28 23 94.8711 23 37.9381
    80,000×8000 40 3 32 150.6195 31 39.0150
    80,000×10,000 40 42 42 238.9067 41 73.1683

     | Show Table
    DownLoad: CSV

    Figure 2 intuitively demonstrates the sharp drop in convergence curves for both the MBK and LMBK algorithms, underscoring their high effectiveness. The key disparity between these two algorithms lies in the fact that the LMBK algorithm first selects features from the columns of the coefficient matrix, which greatly reduces the time for the partition process. The following figure illustrates the comparison of the time for the partition process (Pcpu) between these two algorithms, corresponding to the matrices outlined in Tables 1 and 2, respectively.

    Figure 2.  Convergence curves of GRK, MBK, and LMBK.

    From Figure 3, it is evident that after the feature selection of the coefficient matrix using Algorithm 3.2, the time for the partition process is significantly reduced. Since the quality of the partitioning also directly affects the convergence rate, we provide the following figure to compare the time for the iteration process (ITcpu) for these two algorithms, corresponding to the matrices listed in Tables 1 and 2, respectively.

    Figure 3.  CPU for the partition process.

    From Figure 4, it can be observed that the time for the iteration process of the MBK and LMBK algorithms are nearly the same, with LMBK being relatively smaller. Therefore, by employing Algorithm 3.2 for the feature selection of the coefficient matrix and subsequently partitioning its rows, the obtained partitioning result is reliable.

    Figure 4.  CPU for the iteration process.

    Example 4.2. The coefficient matrix ARm×n is selected from the University of Florida Sparse Matrix Collection. Let [λl:λstep:λr]=[102:102:1], [dl,dr]=[2,100], and the numerical results of GRK, MBK, and LMBK are as in Table 3.

    Table 3.  IT and CPU of GRK, MBK, and LMBK.
    name m×n q d GRK MBK LMBK
    IT CPU IT CPU IT CPU
    Franz10 19,588×4164 5 48 5574 3.3594 7 1.3927 1 0.4788
    scsd8-2c 35,910×5130 15 38 20,824 27.5643 10 5.0037 1 1.1135
    scsd8-2r 60,550×8650 25 20 33,384 66.8752 20 19.3880 10 6.0809
    lp_nug20 72,600× 15,240 30 38 17,779 68.4472 60 48.2827 5 20.8793
    mesh_deform 234,023×9393 85 10 24,131 284.8606 60 199.8508 4 28.9125
    fome20 108,175× 33,874 30 41 148 1465.9682 10 420.4182

     | Show Table
    DownLoad: CSV

    In Table 3, the GRK method fails for the matrix fome20, and the IT and CPU of LMBK are considerably smaller than those of the GRK and MBK algorithms for all convergent cases. We also describe the convergence curves in Figure 5.

    Figure 5.  Convergence curves of GRK, MBK, and LMBK.

    According to the display in Figure 5, the LMBK algorithm exhibits the fastest convergence speed among the three algorithms. In comparison, the convergence speed of the GRK method is significantly slower.

    Example 4.3. In this example, the GRK, MBK, and LMBK algorithms are used to reconstruct two test images. The structural similarity index (SSIM) [40] is used as the metric to evaluate the similarity between the test images and the reconstructed images, which comprehensively considers information regarding brightness, contrast, and structure, providing a score within the range of [-1, 1]. When SSIM=1, it indicates that the two images are identical. Let [λl:λstep:λr]=[102:102:1] and [dl,dr]=[2,1000].

    The first test image is obtained by loading human 3-D chest CT scan data into the MATLAB workspace, then extracting the central slice in the X-Y plane, which performs lung segmentation, as shown in subplot (a) of Figure 6. Let N=90, θ=0:1:179, p=2N, q=5, and we can obtain the coefficient matrix A with a size of 22,860 × 8100 [30]. Then, the GRK, MBK, and LMBK algorithms are used to solve this linear system, and the corresponding reconstructed images are generated and displayed in the following figure.

    Figure 6.  Image reconstruction.

    In Figure 6, it is observed that the corresponding subplot (d) for the LMBK algorithm is the clearest, followed by the MBK algorithm (see subplot (c)). The image for the GRK method (see subplot (b)) is relatively blurry.

    The second test image is obtained by using the function phantom in the Image Processing Toolbox. Let N=100, θ=0:1:179, p=2N, q=15, and the coefficient matrix A with a size of 25,380 × 10,000 can be obtained. The corresponding reconstructed images are generated and displayed in Figure 7.

    Figure 7.  Image reconstruction.

    Figure 7 clearly demonstrates that the reconstructed images obtained by the LMBK algorithm (see subplot (d)) are the clearest. To provide a more intuitive evaluation of image reconstruction achieved by the three algorithms, Tables 4 and 5 list the corresponding IT, CPU, and SSIM values for each algorithm.

    Table 4.  Test image 1 (d=196).
    GRK MBK LMBK
    IT 196 1 1
    CPU 12.5840 10.4741 3.9521
    SSIM 0.6171 1.0000 1.0000

     | Show Table
    DownLoad: CSV
    Table 5.  Test image 2 (d=554).
    GRK MBK LMBK
    IT 8248 2 2
    CPU 71.6283 58.2951 11.7294
    SSIM 0.6198 1.0000 1.0000

     | Show Table
    DownLoad: CSV

    According to Tables 4 and 5, it is observed that the SSIM values for MBK and LMBK are both 1, indicating that the images reconstructed by these two algorithms are highly similar to the test images. In contrast, the SSIM values for GRK are 0.6171 in Table 4 and 0.6198 in Table 5, which are relatively lower, correlating with the clarity of the subplot (a) in Figures 6 and 7. Additionally, the reconstruction CPU for the LMBK algorithm is the shortest.

    In addition, we added 2% white noise to the second test image and performed image reconstruction. Let the image pixel size be N=60, p=2N, q=5, and we get the coefficient matrix A with a size of 15,120 × 3600. The reconstructed images generated by the GRK, MBK, LMBK, and LMBK algorithms are shown in Figure 8.

    Figure 8.  Image reconstruction.

    As shown in Figure 8, the GRK, MBK, LMBK, and LMBK algorithms have successfully reconstructed the test image with 2% white noise. The corresponding IT, CPU, and SSIM values of these algorithms are compared below to more intuitively evaluate their performance in image reconstruction.

    From Table 6, the numerical results indicate that the LMBK algorithm performs the best in terms of IT and CPU. Both the MBK and LMBK algorithms achieve an SSIM value of 1, while the SSIM value for the GRK method is 0.7578, demonstrating the effectiveness of the MBK and LMBK algorithms proposed in this paper.

    Table 6.  Image with 2% White Noise (d=227).
    GRK MBK LMBK
    IT 4821 1 1
    CPU 17.9093 3.4363 1.3232
    SSIM 0.7578 1.0000 1.0000

     | Show Table
    DownLoad: CSV

    In this paper, to efficiently solve large linear systems, we consider selecting features from the columns of the coefficient matrix, and propose the feature selection algorithm based on Lasso. Subsequently, we introduce a new criterion for selecting the projection block and propose two solving algorithms. One is the maximum residual block Kaczmarz algorithm, and the other is the maximum residual block Kaczmarz algorithm based on feature selection. The convergence theory of these algorithms is analyzed, and their effectiveness is demonstrated through numerical results. Additionally, we also verify the performance of the proposed algorithms in image reconstruction.

    Ran-Ran Li: Methodology, data curation, writing–original draft; Hao Liu: Conceptualization, methodology, writing–review and editing. Both of the authors have read and approved the final version of the manuscript for publication.

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    This research was supported in part by the National Natural Science Foundation of China (Grant Nos. 11401305 and 11571171) and Shenzhen Science and Technology Program (Grant No. JCYJ 20230807142002006).

    The authors declare no conflicts of interest.



    [1] R.-R. Li, H. Liu, On randomized partial block Kaczmarz method for solving huge linear algebraic systems, Comput. Appl. Math., 41 (2022), 278. https://doi.org/10.1007/s40314-022-01978-0 doi: 10.1007/s40314-022-01978-0
    [2] Y. Censor, Row-action methods for huge and sparse systems and their applications, SIAM Rev., 23 (1981), 444–466. https://doi.org/10.1137/1023097 doi: 10.1137/1023097
    [3] C. Brezinski, Projection methods for systems of equations, Amsterdam: Elsevier, 1997.
    [4] Z.-Z. Bai, C.-H. Jin, Column-decomposed relaxation methods for the overdetermined systems of linear equations, Int. J. Appl. Math., 13 (2003), 71–82.
    [5] S. Kaczmarz, Angen¨aherte aufl¨osung von systemen linearer gleichungen, Bull. Int. Acad. Polon. Sci. Lett. A, 35 (1937), 355–357.
    [6] F. Natterer, The mathematics of computerized tomography, Philadelphia: SIAM, 2001.
    [7] G. T. Herman, Fundamentals of computerized tomography: Image reconstruction from projections, Springer, 2009.
    [8] C. Byrne, A unified treatment of some iterative algorithms in signal processing and image reconstruction, Inverse Probl., 20 (2004), 103–120. https://doi.org/10.1088/0266-5611/20/1/006 doi: 10.1088/0266-5611/20/1/006
    [9] C. Popa, R. Zdunek, Kaczmarz extended algorithm for tomographic image reconstruction from limited-data, Math. Comput. Simulat., 65 (2004), 579–598. https://doi.org/10.1016/j.matcom.2004.01.021 doi: 10.1016/j.matcom.2004.01.021
    [10] T. Strohmer, R. Vershynin, A randomized Kaczmarz algorithm with exponential convergence, J. Fourier Anal. Appl., 15 (2009), 262–278. https://doi.org/10.1007/s00041-008-9030-4 doi: 10.1007/s00041-008-9030-4
    [11] Z.-Z. Bai, W.-T. Wu, On convergence rate of the randomized Kaczmarz method, Linear Algebra Appl., 553 (2018), 252–269. https://doi.org/10.1016/j.laa.2018.05.009 doi: 10.1016/j.laa.2018.05.009
    [12] Z.-Z. Bai, W.-T. Wu, On greedy randomized Kaczmarz method for solving large sparse linear systems, SIAM J. Sci. Comput., 40 (2018), 592–606. https://doi.org/10.1137/17m1137747 doi: 10.1137/17m1137747
    [13] Z.-Z. Bai, W.-T. Wu, On relaxed greedy randomized Kaczmarz methods for solving large sparse linear systems, Appl. Math. Lett., 83 (2018), 21–26. https://doi.org/10.1016/j.aml.2018.03.008 doi: 10.1016/j.aml.2018.03.008
    [14] J.-J. Zhang, A new greedy Kaczmarz algorithm for the solution of very large linear systems, Appl. Math. Lett., 91 (2019), 207–212. https://doi.org/10.1016/j.aml.2018.12.022 doi: 10.1016/j.aml.2018.12.022
    [15] Y. Liu, C.-Q. Gu, Variant of greedy randomized Kaczmarz for ridge regression, Appl. Numer. Math., 143 (2019), 223–246. https://doi.org/10.1016/j.apnum.2019.04.008 doi: 10.1016/j.apnum.2019.04.008
    [16] Z.-Z. Bai, W.-T. Wu, On greedy randomized augmented Kaczmarz method for solving large sparse inconsistent linear systems, SIAM J. Sci. Comput., 43 (2021), A3892–A3911. https://doi.org/10.1137/20m1352235 doi: 10.1137/20m1352235
    [17] S. Steinerberger, A weighted randomized Kaczmarz method for solving linear systems, Math. Comput., 90 (2021), 2815–2826. https://doi.org/10.1090/mcom/3644 doi: 10.1090/mcom/3644
    [18] X. Yang, A geometric probability randomized Kaczmarz method for large scale linear systems, Appl. Numer. Math., 164 (2021), 139–160. https://doi.org/10.1016/j.apnum.2020.10.016 doi: 10.1016/j.apnum.2020.10.016
    [19] Z.-Z. Bai, L. Wang, On convergence rates of Kaczmarz-type methods with different selection rules of working rows, Appl. Numer. Math., 186 (2023), 289–319. https://doi.org/10.1016/j.apnum.2023.01.013 doi: 10.1016/j.apnum.2023.01.013
    [20] Z.-Z. Bai, W.-T. Wu, Randomized Kaczmarz iteration methods: Algorithmic extensions and convergence theory, Japan J. Indust. Appl. Math., 40 (2023), 1421–1443. https://doi.org/10.1007/s13160-023-00586-7 doi: 10.1007/s13160-023-00586-7
    [21] A. Ma, D. Molitor, Randomized Kaczmarz for tensor linear systems, BIT Numer. Math., 62 (2022), 171–194. https://doi.org/10.1007/s10543-021-00877-w doi: 10.1007/s10543-021-00877-w
    [22] X.-Z Wang, M.-L Che, C.-X. Mo, Y.-M. Wei, Solving the system of nonsingular tensor equations via randomized Kaczmarz-like method, J. Comput. Appl. Math., 421 (2023), 114856. https://doi.org/10.1016/j.cam.2022.114856 doi: 10.1016/j.cam.2022.114856
    [23] S.-Y. Zhong, G.-X. Huang, An almost-maximal residual tensor block Kaczmarz method for large tensor linear systems, J. Comput. Appl. Math., 437 (2024), 115439. https://doi.org/10.1016/j.cam.2023.115439 doi: 10.1016/j.cam.2023.115439
    [24] T. Elfving, Block-iterative methods for consistent and inconsistent linear equations, Numer. Math., 35 (1980), 1–12. https://doi.org/10.1007/bf01396365 doi: 10.1007/bf01396365
    [25] D. Needell, J. A. Tropp, Paved with good intentions: Analysis of a randomized block Kaczmarz method, Linear Algebra Appl., 441 (2014), 199–221. https://doi.org/10.1016/j.laa.2012.12.022 doi: 10.1016/j.laa.2012.12.022
    [26] I. Necoara, Faster randomized block Kaczmarz algorithms, SIAM J. Matrix Anal. Appl., 40 (2019), 1425–1452. https://doi.org/10.1137/19m1251643 doi: 10.1137/19m1251643
    [27] C.-Q. Miao, W.-T. Wu, On greedy randomized average block Kaczmarz method for solving large linear systems, J. Comput. Appl. Math., 413 (2022), 114372. https://doi.org/10.1016/j.cam.2022.114372 doi: 10.1016/j.cam.2022.114372
    [28] D. Needell, R. Zhao, A. Zouzias, Randomized block Kaczmarz method with projection for solving least squares, Linear Algebra Appl., 484 (2015), 322–343. https://doi.org/10.1016/j.laa.2015.06.027 doi: 10.1016/j.laa.2015.06.027
    [29] Y. Liu, C.-Q. Gu, On greedy randomized block Kaczmarz method for consistent linear systems, Linear Algebra Appl., 616 (2021), 178–200. https://doi.org/10.1016/j.laa.2021.01.024 doi: 10.1016/j.laa.2021.01.024
    [30] R.-R. Li, H. Liu, On global randomized block Kaczmarz method for image reconstruction, Electron. Res. Arch., 30 (2022), 1442–1453. https://doi.org/10.3934/era.2022075 doi: 10.3934/era.2022075
    [31] J.-Q. Chen, Z.-D. Huang, On a fast deterministic block Kaczmarz method for solving large-scale linear systems, Numer. Algor., 89 (2022), 1007–1029. https://doi.org/10.1007/s11075-021-01143-4 doi: 10.1007/s11075-021-01143-4
    [32] X.-L. Jiang, K. Zhang, J.-F. Yin, Randomized block Kaczmarz methods with k-means clustering for solving large linear systems, J. Comput. Appl. Math., 403 (2022), 113828. https://doi.org/10.1016/j.cam.2021.113828 doi: 10.1016/j.cam.2021.113828
    [33] T. Hastie, R. Tibshirani, M. Wainwright, Statistical learning with sparsity: The Lasso and generalizations, London: Chapman and Hall/CRC, 2015.
    [34] D. L. Donoho, De-noising by soft-thresholding, IEEE Trans. Inf. Theory, 41 (1995), 613–627. https://doi.org/10.1109/18.382009 doi: 10.1109/18.382009
    [35] L. Xiao, T. Zhang, A proximal stochastic gradient method with progressive variance reduction, SIAM J. Optim., 24 (2014), 2057–2075. https://doi.org/10.1137/140961791 doi: 10.1137/140961791
    [36] A. Beck, M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imag. Sci., 2 (2009), 183–202. https://doi.org/10.1137/080716542 doi: 10.1137/080716542
    [37] K. Bredies, D. A. Lorenz, Linear convergence of iterative soft-thresholding, J. Fourier Anal. Appl., 14 (2008), 813–837. https://doi.org/10.1007/s00041-008-9041-1 doi: 10.1007/s00041-008-9041-1
    [38] H. Karimi, J. Nutini, M. Schmidt, Linear convergence of gradient and proximal-gradient methods under the polyak-Lojasiewicz condition, In: Machine learning and knowledge discovery in databases, Cham: Springer, 2016,795–811. https://doi.org/10.1007/978-3-319-46128-1_50
    [39] T. A. Davis, Y. Hu, The university of Florida sparse matrix collection, ACM Trans. Math. Software, 38 (2011), 1–25. https://doi.org/10.1145/2049662.2049663 doi: 10.1145/2049662.2049663
    [40] U. Sara, M, Akter, M. S. Uddin, Image quality assessment through FSIM, SSIM, MSE and PSNR-a comparative study, J. Comput. Commun., 7 (2019), 8–18. https://doi.org/10.4236/jcc.2019.73002 doi: 10.4236/jcc.2019.73002
  • Reader Comments
  • © 2025 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(441) PDF downloads(46) Cited by(0)

Figures and Tables

Figures(8)  /  Tables(6)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog