1.
Introduction
Consider the following large-scale system of linear equations:
where researchers often employ iterative methods [1,2] to solve them. A simple and effective method is the Kaczmarz method [3], which is extensively utilized across various large-scale computing fields, including computed tomography [4,5,6,7,8], image reconstruction [9,10,11], signal processing [11,12], distributed computing [13,14], etc. As a row iteration method, the classic Kaczmarz method iteratively updates the solution vector by selecting the working row of the coefficient matrix in sequence and projecting it orthogonally to the hyperplane where the row is located. To be more specific, let A(i) represent the ith row of A, and b(i) represent the ith entry of b. Given an initial approximation value x0, the Kaczmarz method can be expressed as follows:
where (⋅)T and ‖⋅‖2 denote the transpose and Euclidean norm of a vector or matrix, respectively, and the target row index ik=mod(k,m)+1.
The theory of the Kaczmarz method has seen significant development since its inception. Initially, Kaczmarz [3] demonstrated the convergence of this method. Subsequently, Galántai [15] provided an upper bound on its convergence rate, while Knight conducted error analysis on it under a limited precision operation in [16]. In recent years, Bai and Liu [17] established a new convergence theorem for the Kaczmarz method using the block Meany inequality. For further research literature, please refer to [18,19].
When the scale of the coefficient matrix is very large, the efficiency of the classic Kaczmarz method significantly decreases as it selects rows in a sequential manner to approximate the solution of the system of equations. In 2009, Strohmer and Vershynin [20] introduced the randomized Kaczmarz (RK) method with expected exponential rate of convergence for solving overdetermined consistent linear systems, reigniting interest in Kaczmarz methods. They refined the row selection strategy by proposing to select the working row with probability Pr(row=ik)=‖A(ik)‖22/‖A‖2F, resulting in a substantial acceleration of its convergence rate. Subsequently, Bai and Wu [21] proposed a greedy randomized Kaczmarz (GRK) method, which introduced a greedy probability criterion to obtain the larger component of the module of the residual vector in each iteration, so that it would be eliminated first in the iteration process, and thus accelerate the convergence rate. It is worth noting that Ansorge [22] proposed a maximal residual Kaczmarz (MRK) method, and Popa analyzed it in [23]. Similar to the working row obtained by the GRK method, this method selects the target working row index ik so that the ik-th component of the residual has relatively the largest absolute value compared to other components, i.e., ik=argmax1≤i≤m|b(i)−A(i)xk|. Based on the GRK method, Bai and Wu [24] proposed a more efficient method, named the relaxed greedy randomized Kaczmarz (RGRK) method, by introducing a relaxation factor. Zhang [25] developed a new greedy Kaczmarz method, and Bai and Wu proved a more precise convergence upper bound for the randomized Kaczmarz method in [26]. For further study of the randomized Kaczmarz method and its variants, see [27,28,29,30,31].
For iterative solutions of large linear equations, in order to further accelerate the convergence rate of the Kaczmarz method, it is a natural idea to use block iteration instead of single-row iteration, so the block Kaczmarz method emerges as the times require. Bai and Liu proved the convergence of the block Kaczmarz method in [17]. Needell et al. pointed out in [32] that the matrix has good row paving, introduced a block strategy that depends on matrix eigenvalues, and proposed the first block Kaczmarz method with (expected) linear convergence to solve the overdetermined least squares problems, which projected the current iterative solution vector onto the solution space of the constrained subsets at each iteration step. To be specific, if the subset Jk is selected at the k-th iteration, let A†Jk denote the Moore-Penrose pseudo-inverse of AJk, and the iteration formula for xk can be expressed as:
with AJk=A(Jk,:) and bJk=b(Jk). Needell et al. then proposed a randomized block Kaczmarz (RBK) method for solving the least squares problem in [33]. As a natural follow-up to the RBK method, Liu and Gu [34] proposed the greedy randomized block Kaczmarz (GRBK) method for solving consistent linear systems. To avoid the high computational cost of pseudo-inverse calculations during block iterations, Necoara [35] proposed the randomized average block Kaczmarz (RABK) method, which projects the current iteration vector onto each row of the selected submatrix and averages them using different extrapolated step sizes. Subsequently, based on the idea of the average block method, Miao and Wu [36] extended the GRBK method to propose the greedy randomized average block Kaczmarz (GRABK) method, while Li et al. introduced the greedy average block Kaczmarz (GABK) method in [37]. Recently, Xiao et al. [38] combined the greedy strategy with the average block method to develop the fast greedy randomized block Kaczmarz method. For more research on the block Kaczmarz method, refer to [39,40,41,42,43].
In this paper, combining the row selection strategy in the MRK method [22] with the row partitioning strategy in the RBK method [33], we construct the maximum residual block Kaczmarz (denoted as MRBK) method, which is designed to prioritize the elimination of the largest block within the residual vector rk during every iteration. Simultaneously, in order to avoid the pseudo-inverse calculation of the MRBK method during iteration, we further develop the maximum residual average block Kaczmarz (denoted as MRABK) method, which completes each iteration by projecting the current solution xk onto each row of AVik and applying different extrapolation step sizes to average them. We give the deterministic convergence analysis of these two methods. Numerical experiments show that the MRABK method outperforms the MRBK method, while both methods are more efficient than the GRK, MRK, RBK, and GRBK methods.
The structure of this paper is outlined as follows. We introduce the maximum residual block Kaczmarz method and prove its convergence in Section 2. In Section 3, we establish the maximum residual average block Kaczmarz method and analyze its convergence. Section 4 features numerical experiments that validate the effectiveness of the proposed methods. Lastly, Section 5 ends the paper with conclusions.
Some basic assumptions: For a real matrix A∈Rm×n, ‖A‖2, ‖A‖F, and A† signify the Euclidean norm, Frobenius norm, and Moore-Penrose pseudo-inverse of matrix A, respectively. We define A as a standardized matrix if each row has a Euclidean norm of 1, meaning ‖A(i)‖2=1, for all i=1,2,…,m. Similarly, for a given vector u, ‖u‖2 also represents its Euclidean norm. The notation σmin(A) and σmax(A) are employed to express the smallest nonzero and largest singular values of matrix A, respectively. Additionally, let us define the set [m] as {1,2,...,m}, where m is an arbitrary positive integer. We consider the collection V={V1,V2,…,Vt} to be a partition of [m] such that the index sets Vi (for i=1,2,…,t) satisfy the conditions Vi∩Vj=∅ for i≠j, and ∪ti=1Vi=[m]. Furthermore, given a specified row index set Vi, we represent the row submatrix of matrix A that corresponds to Vi as AVi, and we denote the subvector of vector b as bVi. The identity matrix of appropriate dimensions is denoted by I. Define the randomized partition of [m] as V={V1,V2,…,Vt} with
where i=1,2,…,t, and we assume that the row partition anywhere else in this paper is as shown in (1.2).
2.
Maximum residual block Kaczmarz method
In this section, drawing inspiration from the ideas behind the MRK [22], RBK [33], and GRBK [34] methods, we are going to construct the maximum residual block Kaczmarz (MRBK) method and analyze its convergence property.
There are typically two approaches to accelerate the Kaczmarz method: the first approach focuses on selecting working rows more efficiently to achieve faster convergence in each iteration, while the second approach aims to utilize row block iteration instead of single-row iteration for acceleration. Building upon these approaches, we propose the MRBK method as follows as Method 1. First, we partition the rows of matrix A to obtain the row block division V of A, i.e., {AV1,AV2,…,AVt}. Next, we denote r(i)k as the ith block component corresponding to the residual vector rk, and then r(i)k=bVi−AVixk, where i=1,2,…,t. We select the working row block Vik according to ik=argmax1≤i≤t‖bVi−AVixk‖22, ensuring that the largest block component of the residual vector rk is eliminated first in each iteration. This significantly improves the convergence rate.
Next, we try to analyze the differences and improvements of the MRBK method compared to the other three methods:
(1) The MRBK method versus the MRK method: The MRBK method accelerates the MRK method naturally by utilizing row block iterations instead of single-row iterations.
(2) The MRBK method versus the RBK and GRBK methods: In comparison to the RBK method, the GRBK method improves the RBK method by introducing a new greedy probability criterion to randomly select the index of the row blocks, which ensures that row blocks with large residual values are prioritized for elimination, thus accelerating the RBK method. However, the GRBK method requires constructing the index set of row blocks and then selecting them based on probability in each iteration. In contrast, our proposed MRBK method selects the row blocks with the largest residual directly, enhancing the iteration efficiency. In fact, along with the idea of "greedy", our method can be called "extremely greedy" when it comes to selecting the index of the row blocks.
Definition 2.1. [33] (Row paving) A row paving A(t,α,β) of an m×n matrix A is defined as a partition V={V1,V2,…,Vt} of the rows such that for each Vi∈V, the following inequalities hold:
where t is referred to as the size of the paving, while α and β are known as the lower and upper paving bounds, respectively.
For the MRBK method, Theorem 2.1 is given to illustrate its convergence.
Theorem 2.1. Assuming that the linear system (1.1) is consistent, for a fixed partition V={V1,V2,…,Vt} of [m], we assume that σ2max(AVi)≤β for each Vi∈V={V1,V2,…,Vt}, starting from any initial vector x0∈R(AT), the iteration sequence {xk}∞k=0 generated by the MRBK method, converges to the unique least-norm solution x⋆=A†b. Moreover, for any k≥0, we have
and
Proof. From the definition of the MRBK method, for a partition V={V1,V2,…,Vt}, k=0,1,2,… and ik∈{1,2,…,t}, we have
Since AVikx⋆=bVik, it holds that
Since A†VikAVik is an orthogonal projector, we can derive the following relation using the Pythagorean Theorem:
We note that
Furthermore, from Method 1, we have
Thus we obtain, for k=1,2,…, that
and hence
Combining (2.3), (2.4), and (2.5), for k=1,2,…, we finally have
In particular, when k=0, we also obtain
Thus, we have
□
Subsequently, we aim to conduct a more in-depth analysis of the convergence factors associated with the MRBK, RBK, and GRBK methods. The convergence factors of these three methods are denoted as ρMRBK, ρRBK, and ρGRBK, respectively. Based on Theorem 2.1, we derive
and from Theorem 3.1 in [34], we can get
where ζ=minVi∈V‖AVi‖2F.
Assuming that the matrix A is a standardized matrix, we can obtain from Theorem 2.1 in [33] that
We observe that ρMRBK<ρRBK as long as t−1<m. In order to compare ρMRBK and ρGRBK, we consider rewriting ρMRBK:
For the row paving {AV1,AV2,…,AVt} of standardized matrix A, we assume that every cardinality of AVi is equal, i.e., equal to ⌊mt⌋, where ⌊⋅⌋ means to round down to an integer. Thus
and
Substitute (2.9) into (2.8), and (2.10) into (2.7), and it can be concluded that ρMRBK<ρGRBK.
Remark 2.1. Based on the preceding analysis, it is evident that when the dimensions of matrix A are substantial, the convergence factors of both the MRBK and GRBK methods exhibit a close resemblance, indicating that their iterative steps are similarly aligned. However, owing to the lower computational cost associated with selecting working row block AVik in the MRBK method, it demonstrates a faster convergence rate in practical applications. The numerical experiments in Section 4 verify our inference.
3.
Maximum residual average block Kaczmarz method
In this section, we consider further improvements to the MRBK method. In step 4 of Method 1, each update of xk requires the pseudo-inverse of AVik to be applied to the vector, which is computationally expensive when the size of matrix A is very large. We develop the maximum residual average block Kaczmarz (MRABK) method by projecting xk onto each row of AVik and averaging them with different extrapolation steps, avoiding the computation of the pseudo-inverse and greatly saving the computational cost at each iteration, as shown in Method 2.
For the MRABK method, Theorem 3.1 is given to illustrate its convergence.
Theorem 3.1. Assuming that the linear system (1.1) is consistent, for a fixed partition V={V1,V2,…,Vt} of [m] and ω∈(0,2), we assume that σ2max(AVi)≤β for each Vi∈V={V1,V2,…,Vt}, starting from any initial vector x0∈R(AT), the iteration sequence {xk}∞k=0 generated by the MRABK method, converges to the unique least-norm solution x⋆=A†b. Moreover, for any k≥0, we have
and
Proof. From step 5 of Method 2 and AVikx⋆=bVik, we can get
By squaring the Euclidean norm on both sides of the above equation, it holds that
Substituting αk into this equality, we have
We see that the first of these inequalities is true since 2ω−ω2>0 for ω∈(0,2), and
From (2.5) in the proof of Theorem 2.1, we can obtain, for k=1,2,…, that
In particular, when k=0, we have
□
From (3.2), we can obtain
and it is not difficult to find that when ω=1, ρMRABK reaches its minimum value, which is 1−σ2min(A)β(t−1), and at the same time, ρMRABK=ρMRBK. However, as can be seen from steps 4 and 5 of Method 2, the average block method is adopted in the MRABK method, which avoids the pseudo-inverse calculation in the xk update process in step 4 of the MRBK method. Therefore, the convergence speed of the MRABK method may be faster than that of the MRBK method. The numerical experiment results in Section 4 verify our conclusions well in the convergence rate.
4.
Experimental results
In this section, the efficiency of the MRBK and MRABK methods is verified through numerical experiments. We test and compare our proposed two methods with the GRK, MRK, RBK, RABK [35], and GRBK methods. In each iteration of the RBK, GRBK, and MRBK methods, we utilize CGLS [44] to solve linear problems. Additionally, we use the same randomized row partition {AV1,AV2,…,AVt} defined as (1.2) for the RBK, RABK, GRBK, MRBK, and MRABK methods, and for the selection of the number of blocks, [36] proves that ⌈‖A‖22⌉ is a good choice, so we set the number of blocks t=⌈‖A‖22⌉ uniformly, where ⌈⋅⌉ means to round up to an integer. Specifically, in the case of the MRABK method, ω is set to 1 to ensure that the convergence factor of MRABK is minimized. Similarly, in the RABK method, ω is also set to 1. The effectiveness of the aforementioned method is assessed based on the quantity of iterative steps (referred to as "IT") and the computing time taken in seconds (referred to as "CPU"). Both IT and CPU indicate the arithmetic mean of the iterative steps and CPU duration needed to perform the process 20 times for every method. To illustrate the effectiveness of our proposed methods, we also compute the speed-up values of the MRBK method in comparison to both the MRK and GRBK methods, as well as the speed-up value of the MRABK method relative to the MRBK method. The definitions for these speed-up values are provided below:
All experiments are conducted on a personal computer equipped with MATLAB (version R2022a), featuring an AMD Ryzen 7 8845H w/ Radeon(TM) 780M Graphics CPU clocked at 3.80 GHz, alongside 32.00 GB of RAM and running on the Windows 11 operating system.
For consistent linear systems (1.1), the coefficient matrix is either given by the MATLAB function sprandn or taken from the SuiteSparse Matrix Collection [45]. For the coefficient matrix A being tested, any zero row vectors are removed and A is normalized to the standard matrix. The right vector is set to b=Ax∗, where vector x∗ is the solution vector generated using the MATLAB function randn. All calculations start with the initial zero vector x0=0 and stop once the relative solution error (RSE) at the current iteration meets the criterion of RSE<10−6 or when IT surpasses the set limit of 200, 000, defined as
where the minimum norm solution x⋆ is obtained using the MATLAB function lsqminnorm.
Define the density of a matrix as
For the first kind of sparse matrix A, we set its sparse parameter is 0.01, i.e., A = sprandn (m, n, 0.01), in Tables 1 and 2, the IT and CPU for the GRK, MRK, RBK, RABK, GRBK, MRBK, and MRABK methods are listed, and we also list the speed-up values for several methods.
From Table 1, we can find that when m=6000 and n=1000, 1500, 2000, 2500, or 3000, both the MRBK method and MRABK method proposed by us outperform other methods in terms of CPU time. Now we focus on the MRK, GRBK, MRBK, and MRABK methods. We observe that the MRBK method, as a block-improved version of the MRK method, is significantly more efficient in terms of IT and CPU time. The SU1 value of these two methods is at least 6.44 and the maximal is 85.74. We analyzed the convergence factors of the GRBK and MRBK methods in Section 2, and concluded that their IT should be very close when the size of A is large. From Table 1, we can observe that the IT of these two methods is very close, and the IT of the MRBK method is slightly smaller than that of the GRBK method. The SU2 value is at least 2.04 and at most 2.82. The MRABK method has the shortest CPU time among all the above methods, and its SU3 value relative to the MRBK method is at least 1.99 and up to 2.82.
From Table 2, we can find that when n=6000 and m=1000, 1500, 2000, 2500, or 3000, the MRBK and MRABK methods are still superior to other methods. Among all the above methods, the MRBK method has the fewest IT, while the MRABK method has the shortest CPU time. We note that under these conditions, the SU1 value is at least 75.35 and the maximum is 155.46. Even though the IT of the MRBK and GRBK methods are almost the same, the CPU time of the MRBK method is better than that of the GRBK method. Compared with the MRBK method, the SU3 value of the MRABK method is at least 1.99 and at most 2.89.
The second type of coefficient matrices selected from the SuiteSparse Matrix Collection [45] are derived from different applications, such as the combinatorial problem and linear programming problem. These matrices possess some unique structures and characteristics, such as being thin (m>n) (e.g., Franz9, GL7d12), fat (m<n) (e.g., p6000, lp_80bau3b), or square (m=n) (e.g., Trefethen_700), and we list details about them in Table 3, including their size, density, and cond(A). For these matrices, we implement the GRK, MRK, RBK, GRBK, RABK, MRBK, and MRABK methods, and list the IT and CPU time for each method in Table 3.
In Table 3, we observe that the MRABK method still maintains the shortest CPU time. When the coefficient matrix is GL7d12 and lp_80bau3b, the MRBK method has the smallest IT. When the coefficient matrix is Trefethen_700 or p6000, the IT is the same as that of the GRBK method. Compared to the MRK method, the MRBK method exhibits a minimal SU1 value of 1.60 and a maximal SU1 value of 60.55. In comparison to the GRBK method, the MRBK method demonstrates a minimal SU2 value of 1.19 and a maximal SU2 value of 2.29. Furthermore, compared to the MRBK method, the MRABK method showcases a minimal SU3 value of 1.09, reaching a maximal SU3 value of 3.33.
To further assess the performance of various block Kaczmarz methods, Figures 1 and 2 display the relationship between the RSE and the IT, as well as the relationship between the RSE and CPU time for different block Kaczmarz methods. As we can see from Figures 1 and 2, with the increase of the IT, the RSE of the MRBK and MRABK methods both decrease faster than the RBK and RABK methods, and the MRBK method has the fastest decline in RSE among all block Kaczmarz methods. With the increase of the CPU time, the RSE of the MRABK method decreases the fastest, followed by the RABK and MRBK methods. Both of the MRBK and MRABK methods are ahead of the RBK and GRBK methods.
5.
Conclusions
This paper presents two new Kaczmarz (MRBK and MRABK) methods for solving consistent linear systems. Both methods utilize uniform randomized partition of the rows of matrix A to construct the row subsets {AV1,AV2,…,AVt}. In each iteration, the MRBK method updates the solution vector by projecting x onto the hyperplane defined by AVikx=bVik, where Vik is selected according to ik=argmax1≤i≤t‖bVi−AVixk‖22, ensuring that the block with the largest residual is eliminated first, leading to rapid convergence. Building upon the MRBK method, the MRABK method introduces an adaptive step size, eliminating the need for calculating the pseudo-inverse of the row subset AVik of A during the x update process, thereby enhancing the convergence speed. We provide a comprehensive analysis of the convergence theory for these two methods and conduct numerical experiments to validate their effectiveness. Both the theoretical analysis and numerical results demonstrate the superiority of our proposed methods over other Kaczmarz methods, including the GRK, MRK, RBK, and GRBK methods. In addition, we have realized that some valuable topics deserve further study, such as fully considering the structure and properties of A, making more efficient row partition of A, and finding a better step size for the MRABK method.
Author contributions
Wen-Ning Sun: Conceptualization, methodology, method design and implementation, numerical experiments and visualization, writing original drafts, reviewing and editing drafts; Mei Qin: Methodology, reviewing, editing drafts. All authors have reviewed and approved the final version of the manuscript prior to its publication.
Conflict of interest
The authors declare no competing interests.