
The PageRank model is widely utilized for analyzing a variety of scientific issues beyond its original application in modeling web search engines. In recent years, considerable research effort has focused on developing high-performance iterative methods to solve this model, particularly when the dimension is exceedingly large. However, due to the ever-increasing extent and size of data networks in various applications, the computational requirements of the PageRank model continue to grow. This has led to the development of new techniques that aim to reduce the computational complexity required for the solution. In this paper, we present a recursive 5-type lumping algorithm combined with a two-stage elimination strategy that leverage characteristics about the nonzero structure of the underlying network and the nonzero values of the PageRank coefficient matrix. This method reduces the initial PageRank problem to the solution of a remarkably smaller and sparser linear system. As a result, it leads to significant cost reductions for computing PageRank solutions, particularly in scenarios involving large and/or multiple damping factors. Numerical experiments conducted on over 50 real-world networks demonstrate that the proposed methods can effectively exploit characteristics of PageRank problems for efficient computations.
Citation: Zhao-Li Shen, Yu-Tong Liu, Bruno Carpentieri, Chun Wen, Jian-Jun Wang. Recursive reordering and elimination method for efficient computation of PageRank problems[J]. AIMS Mathematics, 2023, 8(10): 25104-25130. doi: 10.3934/math.20231282
[1] | Mouhamad Al Sayed Ali, Miloud Sadkane . Acceleration of implicit schemes for large systems of nonlinear differential-algebraic equations. AIMS Mathematics, 2020, 5(1): 603-618. doi: 10.3934/math.2020040 |
[2] | Chun Wen, Qian-Ying Hu, Bing-Yuan Pu, Yu-Yun Huang . Acceleration of an adaptive generalized Arnoldi method for computing PageRank. AIMS Mathematics, 2021, 6(1): 893-907. doi: 10.3934/math.2021053 |
[3] | Shuting Tang, Xiuqin Deng, Rui Zhan . The general tensor regular splitting iterative method for multilinear PageRank problem. AIMS Mathematics, 2024, 9(1): 1443-1471. doi: 10.3934/math.2024071 |
[4] | Jin-Song Xiong . Generalized accelerated AOR splitting iterative method for generalized saddle point problems. AIMS Mathematics, 2022, 7(5): 7625-7641. doi: 10.3934/math.2022428 |
[5] | Shahbaz Ahmad, Adel M. Al-Mahdi, Rashad Ahmed . Two new preconditioners for mean curvature-based image deblurring problem. AIMS Mathematics, 2021, 6(12): 13824-13844. doi: 10.3934/math.2021802 |
[6] | Uğur Gözütok, Hüsnü Anıl Çoban . Detecting isometries and symmetries of implicit algebraic surfaces. AIMS Mathematics, 2024, 9(2): 4294-4308. doi: 10.3934/math.2024212 |
[7] | Liyuan Lan, Zhiyuan Zhou, Hanqing Liu, Xing Wei, Fajie Wang . An ACA-BM-SBM for 2D acoustic sensitivity analysis. AIMS Mathematics, 2024, 9(1): 1939-1958. doi: 10.3934/math.2024096 |
[8] | Shahbaz Ahmad, Faisal Fairag, Adel M. Al-Mahdi, Jamshaid ul Rahman . Preconditioned augmented Lagrangian method for mean curvature image deblurring. AIMS Mathematics, 2022, 7(10): 17989-18009. doi: 10.3934/math.2022991 |
[9] | Xiangmei Li, Kamran, Absar Ul Haq, Xiujun Zhang . Numerical solution of the linear time fractional Klein-Gordon equation using transform based localized RBF method and quadrature. AIMS Mathematics, 2020, 5(5): 5287-5308. doi: 10.3934/math.2020339 |
[10] | Peter Joseph Witbooi, Sibaliwe Maku Vyambwera, Mozart Umba Nsuami . Control and elimination in an SEIR model for the disease dynamics of COVID-19 with vaccination. AIMS Mathematics, 2023, 8(4): 8144-8161. doi: 10.3934/math.2023411 |
The PageRank model is widely utilized for analyzing a variety of scientific issues beyond its original application in modeling web search engines. In recent years, considerable research effort has focused on developing high-performance iterative methods to solve this model, particularly when the dimension is exceedingly large. However, due to the ever-increasing extent and size of data networks in various applications, the computational requirements of the PageRank model continue to grow. This has led to the development of new techniques that aim to reduce the computational complexity required for the solution. In this paper, we present a recursive 5-type lumping algorithm combined with a two-stage elimination strategy that leverage characteristics about the nonzero structure of the underlying network and the nonzero values of the PageRank coefficient matrix. This method reduces the initial PageRank problem to the solution of a remarkably smaller and sparser linear system. As a result, it leads to significant cost reductions for computing PageRank solutions, particularly in scenarios involving large and/or multiple damping factors. Numerical experiments conducted on over 50 real-world networks demonstrate that the proposed methods can effectively exploit characteristics of PageRank problems for efficient computations.
Google proposed the PageRank model [1] to assess the relevance of web pages that match a user-defined search query. Since then, PageRank has been widely used in various fields that go beyond search engines, see for instances [2,3,4,5,6,7,8,9,10]. This widespread use is attributed to the simplicity of the model and the universality of its underlying principle. PageRank is often used to evaluate the centralities or importances of network nodes, which are measured by the stationary probability vector of a random process that visits nodes through their directed links. In detail, an adjacency matrix G∈Nn×n stores link structure of the network nodes with quantity n, where G(i,j)=1 when there is a directed edge from the jth node to the ith node. The transition probability matrix P∈Rn×n of the random process is formed by
P(i,j)={1n∑k=1G(k,j),when G(i,j)=1,0,otherwise. | (1.1) |
To guarantee the existence of a unique stationary distribution and the convergence of the Power iteration of the random walk process [11,12], typically matrix P is transformed into a stochastic matrix A through
A=α˜P+(1−α)veT. | (1.2) |
In (1.2), ˜P=P+vdT, where the binary vector d∈Nn×1 has value 1 at the indexes with no out-link, 0≤v∈Rn×1≤1, ‖v‖1=1, e is a vector of all ones, 0<α<1 is the damping factor and assigns the probability that the traveler transfers through following directed links, while veT represents probabilities of transferring in other ways such as jumping to a random unlinked node. The PageRank model then can be regarded as computing the eigenvector x satisfying the equation
Ax=x,‖x‖1=1,x>0. | (1.3) |
As only the ratio between the importances of nodes are needed, the relative importances can also be obtained by solving the following linear equations [13]
(I−αP)x=v. | (1.4) |
Since the PageRank model was proposed, the development of high-performance solvers for systems (1.3)–(1.4) has attracted much study attention. Classical iteration algorithms, e.g., the Power method, often prove effective when the damping factor equals to a moderate value (e.g., Google recommended to use α=0.85 for search engine ranking). On the other hand, when α increases to 1, often needed by some applications [14]), the speed that stationary iterative methods converge deteriorates sharply, necessitating the exploitation of more robust solvers. Extrapolation algorithms [15,16,17,18], adaptive methods [19], multigrid solvers [20,21], inner-outer strategies [22,23,24,25,26], Krylov subspace algorithms [27,28,29,30] and methods that combine stationary iterations and Krylov subspace algorithms [31,32,33,34] are used in the quest of robust solvers. It has been demonstrated that in some problems involving large damping factors, Krylov subspace methods can save over 50% CPU time cost when compared with classical iterative methods [35]. Nonetheless, due to the ever-increasing extent and size of data networks in applications, the computational demands of the PageRank model continue to grow, motivating the development of new techniques that can reduce the computational complexity required for the solution.
One relevant research direction classifies network nodes into family of special types, e.g., those with and without out-links, and reorders them accordingly so that only the PageRank values of one type are the solution of a linear system, while the PageRank values of the remaining nodes can be obtained efficiently using vector-vector and matrix-vector operations. If the linear system corresponding to one type of nodes (called the "kernel linear system") is significantly smaller than the original system, the costs associated with the solution of the PageRank model can be greatly decreased. Some of the relevant methods reorder the nodes into two types [13], three types [36], five types [37] and include recursive variants [38]. A different research direction exploits the similarities between rows in the adjacency matrix G to either devise elimination algorithms [39] that increases the sparsity of the PageRank problem (1.4), or find suitable permutations that reveal some off-diagonal low-rank blocks in the coefficient matrix I−αP [40,41], leading to highly efficient compressed matrix representations of (1.4). The effectiveness of these strategies depends on the structural characteristics of the underlying network. However, they proved useful in simplifying realistic PageRank calculations in many contexts. A deeper understanding of the structural properties of existing networks arising from different backgrounds, the development of new strategies to further exploit these properties, and the design of fast solvers for the simplified problem are some issues that still require additional research.
In this paper, we present a recursive 5-type reordering approach and a two-stage elimination algorithm that combines information of both the nonzero structure of the underlying network and the numerical properties of the PageRank model to reduce the computational complexity. Meanwhile, the effect of the elimination strategy on changing the spectrum distribution is demonstrated to be controlled. The remaining content is structured as follows. Section 2 covers the most generally used methods for classifying graph nodes into different types. Section 3 presents a recursive 5-type reordering algorithm. Section 4 introduces a new two-stage elimination strategy which extends the work proposed in [39], and analyzes its impact on the eigenvalues distribution of the PageRank linear system. Section 5 presents real-world network simulations. Finally, Section 6 discusses some remarks arising from this study and perspective of future works. The MATLAB notation [42] is used throughout the paper.
Current approaches to classifying nodes in the web adjacency graph begin with Lee, Golub, and Zenios' work [43], which takes advantage of the presence of dangling nodes, i.e., nodes with 0 out-degree. The method aggregates all dangling nodes into one node, and then computes the PageRank values by solving two smaller linear systems. Langville and Meyer refined this initial idea [13]. They classify the nodes of the web adjacency graph into dangling nodes (D) and nondangling nodes (ND) [13]; we refer to this classification scheme as Lump-2. They also reorder the nodes into two subvectors corresponding to dangling and nondangling nodes, and permute the linear system symmetrically accordingly. The new system is shown as:
˜xT(1−α˜P)=˜vT, | (2.1) |
where
NDD | (2.2) |
˜P=ΠPΠT=NDD[˜P11˜P1200], | (2.3) |
and Π is the permutation matrix*, ˜xT=xTΠT, ˜v=vTΠT. Note that,
*Π is obtained through reordering the rows and columns of an identity matrix according to the lumping algorithm.
(1−α˜P)−1=[(1−˜P11)−1α(1−˜P11)−1˜P120I]. | (2.4) |
Thus, the permuted PageRank vector ˜xT=˜vT(1−α˜P)−1 can be formed as:
˜xT=(˜vT1(1−˜P11)−1,α˜vT1(1−˜P11)−1˜P12+˜vT2). | (2.5) |
After splitting ˜xT into [˜x1,˜x2]T, where ˜x1T and ˜x2T store the PageRank values of nondangling and dangling nodes, respectively, the solution procedure can be implemented as
˜x1T=˜vT1(1−α˜P11)−1, | (2.6) |
˜x2T=α˜x1T˜P12+˜v2T. | (2.7) |
Iterative algorithms, such as Krylov subspace methods, need only solve the smaller linear system (2.6) (which we refer to as the "kernel linear system") to compute the PageRank values of nondangling nodes, i.e., subvector ˜x1T. Then, the PageRank values of dangling nodes, i.e., subvector ˜x2T, can be obtained at the negligible cost of one matrix-vector multiplication and one vector-vector addition. Thus, if the number of nondangling-nodes is substantially smaller than the total dimension, the time needed to solve the PageRank problem can be drastically reduced. This method can provide a fivefold performance speedup over the Power method when more than 80% of the total nodes are dangling nodes.
Following this idea, a number of enhancements are suggested to further reduce the size of the kernel system. They can be classified in one of two ways: either categorizing the nodes into more classes, or applying the reordering recursively to a given class. For these two distinct research directions, we review the representative methods briefly before introducing a new reordering algorithm that combines the different strategies.
Lin et al. further classify the nodes in the web adjacency graph into three categories: strongly nondangling nodes (S), weakly nondangling nodes (W), and dangling nodes (D), with the new type W representing nodes that only point to dangling nodes [36]. We refer to Lin et al.'s method as Lump-3. By reordering the nodes as [S,W,D] and permuting the problem symmetrically, the matrix ˜P in the permuted PageRank linear system (2.1) becomes
˜P=ΠPΠT=SWD[˜P11˜P12˜P1300˜P23000], | (2.8) |
while ˜x and ˜v are partitioned according to node type: ˜x=[˜x1T,˜x2T,˜x3T]T, ˜v=[˜v1T,˜v2T,˜v3T]T. Then, this permuted problem is expressed as
[˜x1T,˜x2T,˜x3T][I−α˜P11−α˜P12−α˜P130I−α˜P2300I]=[˜v1T,˜v2T,˜v3T], | (2.9) |
and ˜x1T,˜x2T,˜x3T can be obtained using the formulas
˜x1T=˜vT1(1−α˜P11)−1; | (2.10) |
˜x2T=α˜x1T˜P12+˜v2T; | (2.11) |
˜x3T=α˜x1T˜P13+α˜x2T˜P23+˜v3T. | (2.12) |
Iterative methods are only used to solve the linear system (2.10) for the subvector of PageRank values of S, which is not larger than the set ND in the Lump-2 method; therefore, it may be less expensive to solve.
Langville and Meyer generalize the Lump-2 and Lump-3 algorithms to a recursive version that recursively applies Lump-2 reordering to the upper-left corner of the permuted transition matrix ˜P [38]. We refer to this method as Lump-R. The structure of the transition matrix after k−1 reordering is described below:
˜P=ΠPΠT=[˜P11˜P12˜P13⋯˜P1k0˜P23⋯˜P2k0⋯˜P3k⋱⋮0]. | (2.13) |
Meanwhile, ˜x and ˜v are permuted accordingly as follows: ˜x=[˜x1T,˜x2T,˜x3T,⋯,˜xkT]T, ˜v=[˜v1T,˜v2T,˜v3T,⋯,˜vkT]T. From (2.13), ˜x can be computed as follows:
˜x1T=˜vT1(1−α˜P11)−1. | (2.14) |
Fori=2,3,⋯k,˜xiT=α∑i−1j=1˜xjT˜Pji+˜viT. | (2.15) |
The Lump-2 and Lump-3 techniques are instances of this recursive reordering procedure for k=2 and k=3, respectively. This method is useful for uncovering the network structure in greater depth and it may reduce the dimension of the "kernel linear system" further as k increases, but the reordering phase requires more time {[38]. As a result, it is recommended to terminate the recursive process when the upper-left submatrix ˜P11 contains few dangling nodes.
Similarly, the previously discussed classification for nodes with zero outlinks can be applied to nodes with zero inlinks, i.e., nodes that are not referenced by other nodes. These are referred to as unreferenced nodes (UR), in contrast to referenced nodes (R), which are referenced by other nodes. Taking R and UR categories into consideration, Yu et al. further classify the Lump-3 method's types into 5 categories: strong nondangling & referenced nodes (S&R), strong nondangling & unreferenced nodes (S&UR), dangling & referenced nodes (D&R), dangling & unreferenced nodes (D&UR), weakly nondangling nodes (W). This technique is known as Lump-5. Figure 1 depicts the relationship between these five categories.
By reordering the nodes as [S&R,S&UR,W,D&R,D&UR] and permuting the problem symmetrically, the matrix ˜P in the permuted PageRank linear system (2.1) becomes
˜P=ΠPΠT=S&RS&URWD&RD&UR[˜P110˜P13˜P140˜P210˜P23˜P240000˜P3400000000000], | (2.16) |
and ˜x and ˜v are reordered accordingly as ˜x=[˜x1T,˜x2T,˜x3T,˜x4T,˜x5T]T, ˜v=[˜v1T,˜v2T,˜v3T,˜v4T,˜v5T]T. The permuted problem can be written
[˜x1T,˜x2T,˜x3T,˜x4T,˜x5T][I−α˜P110−α˜P13−α˜P140−α˜P21I−α˜P23−α˜P24000I−α˜P340000I00000I]=[˜v1T,˜v2T,˜v3T,˜v4T,˜v5T], | (2.17) |
and ˜x1T,˜x2T,˜x3T,˜x4T,˜x5T can be calculated as follows:
˜xT1=(˜vT1+α˜vT2˜P21)(1−α˜P11)−1; | (2.18) |
˜xT2=˜vT2; | (2.19) |
˜xT3=α(˜xT1˜P13+˜xT2˜P23)+˜vT3; | (2.20) |
˜xT4=α(˜xT1˜P14+˜xT2˜P24+˜xT3˜P34)+˜vT4; | (2.21) |
˜xT5=˜vT5. | (2.22) |
This method detects more network structural features than the prior reordering algorithm and generates a smaller "kernel linear system" for test problems in [37]. However, its performance on really large networks has to be shown yet. Furthermore, it will be interesting to see if a recursive implementation of this method can further reduce the dimension of the "kernel linear system".
This section presents the Lump-R5 algorithm, which applies Lump-5 recursively to the current "kernel linear system" until the dimension cannot be reduced further or the algorithm is executed a predetermined number of times. Throughout each phase of the recursive procedure, the nodes can be classified into five distinct categories. Although this algorithm can be conveniently described as a recursive call to itself, we provide formulas for calculating the PageRank value of each node category. The purpose of these formulas is to shed light on efficient implementations of this algorithm, as opposed to a more intuitive but less efficient recursive implementation, and to demonstrate how the method can be adapted to solve PageRank linear systems with multiple damping factors.
In the permuted problem (2.1), after k iterations of Lump-5 reordering, there are n=4k+1 classes of nodes, and ˜x and ˜v are reordered as follows: ˜x=[˜x1T,˜x2T,˜x3T,⋯,˜xnT]T, ˜v=[˜v1T,˜v2T,˜v3T,⋯,˜vnT]T. Below are the formulas used to compute ˜x1T,˜x2T,˜x3T,⋯,˜xnT. Note that ˜x1 are the only type of nodes that require solving a linear system:
˜xT1=(˜vT1+αk−1∑j=0˜xT4j+2˜P4j+2,1)(I−α˜P11)−1. | (3.1) |
To express the computation formulas for the PageRank of the remaining nodes, we introduce an integer l (k−1⩾l⩾0):
˜xTi=αk−l−1∑j=1˜xT4j+i˜P4j+i,i+˜viT,wheni=4l+2⩽n; | (3.2) |
˜xTi=αi−1∑j=1˜xjT˜Pji+αk−l−1∑j=1˜xT4j+i−1˜P4j+i−1,i+˜viT,wheni=4l+3⩽n; | (3.3) |
˜xTi=αi−1∑j=1˜xjT˜Pji+αk−l−1∑j=1˜xT4j+i−2˜P4j+i−2,i+˜viT,wheni=4l+4⩽n; | (3.4) |
˜xTi=αk−l−1∑j=1˜xT4j+i−3˜P4j+i−3,i+˜viT,wheni=4l+5⩽n. | (3.5) |
According to (3.2), we first compute ˜x4l+2T in decreasing order of l. Then, ˜x1T and ˜x4l+5T can be computed using Eqs (3.1) and (3.5), where (3.1) is solved by iterative methods. Following that, ˜x4l+3T and ˜x4l+4T can be calculated using Eqs (3.3) and (3.4). The PageRank vectors of n nodes are concatenated and permuted at this stage, to arrive at the required solution xT. The computation of ˜x4l+2T, ˜x4l+3T, ˜x4l+3T and ˜x4l+4T can be done in parallel. In our cases, where l is not excessively large, the cost of computing these quantities is negligible compared to the cost of solving for ˜x1T. Therefore, we will not discuss implementation optimization in detail here.
In conclusion, the Lump-R5 algorithm is depicted in Algorithm 1 shown below.
Algorithm 1 Recursive Lump-5 reordering method (Lump-R5). |
Input: Damping factor α, convergence torenlcen tol, number of recursive steps k.
1: Permute the transition matrix P in the original linear system, so that the same class of nodes are clustered. 2: Let l=k−1. 3: while l≠0 do 4: Computing ˜xT4l+2 through (3.2). 5: l=l−1. 6: end while 7: Use iterative algorithms to solve ˜x1T according to (3.1). 8: Computing directly by (3.1) - (3.5) for the PageRank value of the remaining nodes. 9: Set ˜x=[˜x1T,˜x2T,˜x3T,⋯,˜xnT]T, compute the PageRank vector xT=˜xTΠ, and normalize xT=xT/‖xT‖1 10: return x; |
Here, the computational costs of the lumping algorithms are compared approximately. Because Lump-2 and Lump-3 algorithms are instances of the Lump-R algorithm, and Lump-5 is an instance of Lump-R5, we only compare Lump-R5 and Lump-R. The cost of each recursive step for these two recursive methods comes mainly from classifying nodes into special categories. Note that the classification can be conducted using the binary information contained in the adjacency matrix G. Lump-R requires finding the dangling nodes, which necessitates the computation of the number of nonzero elements in each row of G. This costs s1∈[nnz(G)−n,nnz(G)] additions of integers. For Lump-R5, it is necessary to identify the dangling nodes, unreferenced nodes, and weakly dangling nodes, and then the required 5 categories of nodes are determined by performing some straightforward intersection operations. Finding unreferenced nodes requires calculating the number of non-zero elements in each column of G, which costs s2∈[nnz(G)−n,nnz(G)] additions of integers. Finally, finding the weakly dangling nodes requires computing the row sums of the submatrix G(nondangling,nondangling) corresponding to the nondangling nodes, which costs at least s3∈[nnz(G(nondangling,nondangling))−length(nondangling),nnz(G(nondangling,nondangling))] operations. In general, (1) the computational cost of this implementation is less than that of finding the dangling nodes, and the greater the proportion of dangling nodes, the lower the computational cost; (2) a recursive step of Lump-R5 requires approximately two to three times as many integer additions as Lump-R. Note, additionally, that: (1) the computational cost of these algorithms depends on the number of nonzero elements and the number of dangling nodes of the left-top block at each recursive step, and thus is problem-dependent; (2) the storage format affects the computational cost in actual computer implementations, e.g., computing column sums on the MATLAB platform takes significantly less time than computing row sums; and (3) data extraction and redistribution operations also incur significant costs that are platform-dependent and difficult to account for.
We provide an insight on the merits of the Lump-2, Lump-3, Lump-R, Lump-5 and the proposed Lump-R5 methods to reduce the dimension of the PageRank linear system on a 3774768×3774768 adjacency matrix named "patents" downloaded from https://sparse.tamu.edu/, using 4 recursive steps for Lump-R and Lump-R5.
Figures 2 and 3 of the original and reordered matrix patterns for this problem reveal the block structures produced by the five reordering methods. Nondangling nodes represent the largest proportion, while dangling nodes represent the second largest proportion (see Figure 3 (left)). There are many weakly nondangling nodes and unreferenced dangling nodes, suggesting that the dimension of the "kernel linear system" can be decreased even further when only the dangling nodes are considered. Indeed, it is not necessary to split dangling nodes into dangling & referenced nodes and dangling & unreferenced nodes when only the "kernel linear system" dimension is considered. Figure 3 (right) shows that the proportion of strong non-dangling & referenced nodes decreases with each recursive step, and the speed of such a decrease is diminishing with the recursive process proceeds. Similar observation of the decrease in the proportion of the non-dangling nodes can be found in Figure 2. It is interesting to confirm this phenomenon in general. Figures 2 and 3 indicate that the Lump-R5 method produces the smallest upper left block of the five reordering methods, which is a significant advantage from a computational viewpoint. In Section 5, we test the effectiveness of these methods on a large set of network matrices. However, some general observations can be made based on this analysis.
● Classifying nodes into several types requires searching and permutation operations. Typically, these costs are small compared to the cost of solving the PageRank linear system. Therefore, if the reordering procedure significantly reduces the size of the "kernel linear system", its implementation is justified especially in the case multiple instances of the same network is solved.
● For the PageRank problem on a fixed network with multiple values of the damping factor, the Lump-2, Lump-3, and Lump-R methods produce "kernel linear systems" of the shifted form (I−αiP)x=v with collinear right-hand sides (see Eqs (2.6), (2.10) and (2.14)), and these shifted systems can be simultaneously solved using Krylov subspace algorithms. This observation is important for an efficient implementation. Although, on the other hand, Lump-5 and Lump-R5 do not naturally maintain the collinearity of the right-hand sides, as shown by Eqs (2.18) and (3.1)), the sequence of shifted linear systems with s damping factors ˜xT1=(˜v1T+αi˜v2TP21)(1−αi˜P11)−11≤i≤s can be transformed into a sequence of shifted linear systems with two separate right-hand sides [˜xT11,˜xT12]=[˜v1T,˜v2T˜P21](1−αi˜P11)−1 so that block shifted Krylov methods can be applied to solve them simultaneously.
● If networks lack a notable special structure that could yield a significantly smaller "kernel linear system", it can be worth considering some combined strategy that splits the matrix into a linear combination of two matrices, one of which can be reordered effectively.
● All of the reordering methods discussed in this section produce the "kernel linear system" of the form
˜xT1(I−α˜P11)=bT1, | (3.6) |
where ˜P11 is the upper-left block of the permuted transition matrix ˜P, and bT1 denotes the right hand side, e.g., bT1=˜vT1 for Lump-2, Lump-3 and Lump-R, and bT1=(˜v1T+α˜v2T˜P21)T for Lump-5. The following theorem shows that some similarities exist between the spectrum ranges and the matrix types of the original PageRank system and its "kernel linear system".
Theorem 3.1. The coefficient matrix A of the original PageRank system (1.4) and the coefficient matrix I−α˜P11 of the "kernel linear system" (3.6) are both nonsingular M-matrices. The modules of the eigenvalues of these two matrices are in the interval [1−α,1+α].
Proof. According to (1.1), P≥0 and ρ(P)≤‖P‖1=maxj(∑ni=1P(i,j))≤1. Because 0<α<1, the PageRank coefficient matrix A=I−αP is a nonsingular M-matrix, and the modules of its eigenvalues are in the interval [1−α,1+α]. As ˜P and ˜A are the symmetrically permuted versions of P and A, respectively, the above analysis and conclusion also apply to ˜P and ˜A. Because ˜P11 in the "kernel linear system" (3.6) is a sub-matrix of ˜P, we get ˜P11>0 and ρ(˜P11)≤‖˜P11‖1=maxj(∑ni=1˜P11(i,j))≤maxj(∑ni=1P(i,j))=‖P‖1≤1. Therefore, the coefficient matrix I−α˜P11 of the "kernel linear system" remains a nonsingular M-matrix, and modules of its eigenvalues continue to lie in the interval [1−α,1+α].
Therefore, any efficient solver for the original PageRank system (1.4) can be used to solve the "kernel linear system" corresponding to any of the described reordering schemes.
● Some bioinformatics networks have special structures that have the potential to accelerate PageRank computations but have not yet been fully utilized. Figure 4 depicts the pattern structure of the protein-protein interaction networks of Human, fruit fly, mouse and yeast downloaded from the Molecular INTeraction Database https://mint.bio.uniroma2.it/index.php/download/. It is clear that such structure is preferable for constructing approximate inverses or preconditioners. In addition, the zoom-in section demonstrates the existence of nested-block structures and similar row sparsity-patterns, which can be exploited using methods such as [44] and [39,40], respectively.
The two essential components of iterative solvers for PageRank linear systems are an accelerator, which is any algorithm that accelerates the convergence of the basic Power method, and a preconditioner, which modifies the initial system to make it more amenable to the iterative solution. Iterative methods are able to overcome the memory constraints of direct methods because they mainly require matrix-vector multiplications. When sparse storage format is used, the computational complexity of matrix-vector product operations is nearly proportional to the number of non-zero elements in the matrix, i.e., O(nnz). In addition, this quantity also has a significant impact on the construction cost of matrix factorization methods such as incomplete lower-upper triangular decompositions (which are commonly used as preconditioners). As a result, reducing the density of the left-upper block is a critical issue for improving the performance of a PageRank solution. The elimination strategy [39] and the off-diagonal low-rank factorization [40] are two examples of techniques that achieve this goal by utilizing some properties of PageRank matrices. These properties are recalled below.
Property 4.1. [39] In the PageRank transition matrix P, P(j,i)=P(k,i) when these are nonzero values for 1≤i,j,k≤n.
Property 4.2. [39] For the network adjacency matrix G and the PageRank transition matrix P,
P(i,S)=P(j,S)whenG(i,S)=G(j,S), |
where 1≤i,j≤n, S⊆{1,2,⋯,n}.
Due to these properties, it is possible to determine the degree of similarity, or the identity, of two rows of P by simply comparing their sparsity patterns. If a group of rows have very similar non-zero patterns, many of their non-zero values may be annihilated by row subtraction. Note, however, that this operation may generate fill-ins outside the common non-zero pattern between two rows; therefore, the row reduction should only be executed if more non-zero values are eliminated than those that are filled in. In general, the criterion to guide the decision if the row subtraction row j−row i should be performed or not is
comnnz(i,j)>rowsum(i)−comnnz(i,j), | (4.1) |
where rowsum denotes the number of nonzero elements of a certain row of G and P, and comnnz(i,j) denotes the quantity of non-zero values in the common sparsity pattern of rows i and j.
The elimination operation between two rows can be expressed as EijP, where matrix Eij={ekl} formed by subtracting the ith row from the jth row on the identity matrix of dimension n.
Note that, due to the possibility of introducing negative values in row j, EijP may lose Properties 1–2, as the previous example demonstrates. Once a row has been eliminated, it is marked and skipped in the subsequent stages of the elimination algorithm. The main framework of this strategy is concluded as below:
(i) select sequentially row i from the unmarked rows to be the reference row, or generate complete elimination operator E←∏Eij, which accounts for all row subtractions if all rows are marked;
(ii) traverse subsequent unmarked rows and reduce them if criterion (4.1) is satisfied, generating the corresponding operator matrix Eij;
(iii) mark row i and the eliminated rows, then return back to step (i).
The following two improvements are introduced in this framework to enhance the overall efficiency in the elimination strategy that we published in [39].
● Since eliminating very sparse rows may not be very efficient, it is suggested to initially permute rows in rowsum-increasing order and skip in the search those that have cumulative rowsum smaller or equal than θ times the total number of nonzeros in G. For instance, when θ is set to 25%, we only consider the rows with largest densities that contain at most 75% nonzero values of P.
● In the elimination, for each reference row, only at most ω (which is typically a small number, such as 100) unmarked rows are traversed.
In web page networks, the proportion of nodes with k in-links almost equals to 1/kγ, and the most recent estimate of γ is 2.1 [45]; the network structure also exhibits a nested block characteristic [44]. The above scheme can significantly reduce the time costs when dealing with web link networks based on these characteristics, while a little sacrifice in the effect of eliminating the nonzero values. A detailed algorithm description of this elimination process can be found in Algorithm 1 in [39], and we call this algorithm the first-step elimination algorithm.
In every ith column of the elimination operator E, the indexes of negative values refer to the rows reduced using the ith reference row. Meanwhile, each row of E has at most 1 negative element because it can be eliminated at most once. Therefore, nnz(E)<2n and inv(E)=abs(E) [39]. The transformed system after elimination of (1.4) can be expressed as
AEx=vE, with AE=EA, vE=Ev, | (4.2) |
where AE and PE are expected to be sparser than A and P, respectively.
Note that, the eliminated matrix AE is expected to be different with A in terms of eigenvalues, and it may be less favorable for iterative solvers. Fortunately, it has been demonstrated theoretically and confirmed experimentally that the perturbation is bounded by a moderate amount [39].
Although the reduced matrix EP may no longer satisfy Properties 1 in general, the following result provides some hints for continuing the elimination.
Theorem 4.1. In the eliminated PageRank transition matrix EP, P(j,i)=±P(k,i) when these are nonzero values for 1≤i,j,k≤n.
Proof. Because of Property 1, elimination modifies a non-zero element P(i,j) of P to 0 when the ith row of P is subtracted from a reference kth row that has a non-zero value P(k,j), otherwise P(i,j) remains unchanged. Meanwhile a zero element P(i,j)=0 in the same jth column changes to −P(k,j), and P(k,j)=P(i,j) due to Property 1. In conclusion, the non-zero values in the jth column are either equal or opposite to one another.
Example 4.1. Given a PageRank transition matrix P, the eliminated matrix EP generated after the first-step elimination algorithm will be
P=(14141312141413014141301414012)⟶EP=(14141312000−12000−1200−130). |
The non-zero elements in each column of EP have either same values or opposite values. It is evident that matrix EP can be eliminated further via basic row operations, e.g., subtracting the 2nd row from the 3rd row decreases the number of non-zero values by 1 in row 3, and adding the 4th row to the 1st row reduces the non-zero elements of row 1 by 1.
In order to continue elimination on EP, we need to accurately determine whether a row addition/subtraction row j±row i will decrease or increase the density. Consequently, criterion 4.1 must be updated to take into account the new structure of EP provided by Theorem 4.1. Specifically, fixing a reference row i, we eliminate row j if one of the two operations row j:=row j+row i or row j:=row j−row i decrease the density in row j. Both operations have the potential to achieve the goal. For example, if row j=(a1,a2,a3,a4,a5,a6,a7) and row i=(a1,a2,a3,a4,−a5,−a6,−a7) with ai>0 (i=1,2,⋯,7), both row j:=row j+row i and row j:=row j−row i reduce the number of non-zeros in row j, but the latter should be preferred as it annihilates more entries. If we denote as nnz(j)+ and nnz(j)− the numbers of non-zero elements of row j+row i and row j−row i, respectively, we decide to reduce row j when min(nnz(j)+,nnz(j)−)<nnz(j), and we record the type of operation (either addition or subtraction). We outline the second-step elimination algorithm as Algorithm 2. The whole two-stage elimination strategy sequentially implements the first-step elimination algorithm and then the second-step elimination algorithm.
Algorithm 2 Second-step elimination process for PageRank models. |
Input: EG, θ, ω
1: Compute the numbers of nonzero entries in rows of EG by rowsum=sum(EG,2). 2: Reorder rowsum to a value increasing order, reorder rows of EG accordingly. 3: Compute the smallest s satisfying rowsum(1:s)≥θnnz(EG). 4: Set mark=[0,0,⋯,0]∈R1×n 5: for i=s+1 : n do 6: groupi=∅, operationi={1}; 7: if mark(i)=0 then 8: groupi={i}; 9: for j=(i+1) : min(i+ω,n) do 10: if mark(j)=0 then 11: Compute nnz(j)+ and nnz(j)−. 12: Compute [M,ind]=min(nnz(j)+,nnz(j)−) 13: if M<nnz(j) then 14: groupi={groupi,j} 15: if ind==1 then 16: operationi={operationi,1} 17: else 18: operationi={operationi,−1} 19: end if 20: Set mark(j)=1. 21: end if 22: end if 23: end for 24: Set mark(i)=1. 25: end if 26: end for 27: Construct the permutation perm={1:m,groupm+1,groupm+2,⋯,groupn} 28: Generate the elimination operator E by modifying the m+1:n columns of identity matrix according to operationi, i=m+1,m+2,…,n. 29: Permute E using the inverse perm of perm. 30: Reorder the rows of EG and E back to the original order. 31: return E; |
Note that, the sequence of row-operations performed during elimination is stored in the variables m, groupi and operationi, i=m+1,m+2,…,n at lines 26−29 of Algorithm 2. These steps are straightforward to implement using a high-level programming language such as MATLAB. For the sake of clarity, we will briefly describe the implementation of line 27 in the example that follows.
Example 4.2. Suppose G is of dimension 5, m=1, group2={2,3}, group2=∅, group4={4,5}, operation2={1,−1}, operation3={1}, operation4={1,1}, operation5={1}. Then the line 27 can be implemented by:
(1) row(i).index = i:i+length(groupi)-1, col(i).index = i*ones(1, length(groupi)), i=2,3,4,5;
(2) whole_row = [row(2:5).index], whole_col = [col(2:5).index];
(3) value = [operation2,⋯,operation5];
(4) EC = sparse(whole_row, whole_col, value, 4, 4);
(5) E = blkdiag(speye(5-4), EC);
The main difference between the second-step elimination algorithm (Algorithm 2) and the single-stage elimination algorithm [39] from the standpoint of computational cost is that: for each candidate reference rows i, the former algorithm requires to assess the effect on another row j of two elimination operations, namely row j:=row j+row i and row j:=row j−row i, while the latter of only one operation, row j:=row j−row i. According to the analysis presented in [39], the time required to compute the elimination effect varies by problem and parameter. In general, it is not possible to predict which algorithm will be the most cost-efficient. For the analysis of the computational cost of a single-stage elimination algorithm, one can refer to [39], as it is also highly applicable to the second-stage elimination algorithm.
Next, we examine the impact of the two-stage elimination algorithm on the eigenvalues distribution of the Google matrix A, which may determine to a large extent the convergence speed of iterative solvers applied to (1.4). We denote by E1 and E2 the 1-step and 2-step elimination operators, respectively. We will begin by introducing the following theorems.
Theorem 4.2. Let AE1=E1⋅A and AE2=E2⋅AE1=E2⋅E1⋅A be the PageRank coefficient matrices after the first-step elimination and then the second-step elimination (Algorithm 2), respectively. We establish the following bounds:
‖E2E1‖1=‖(E2E1)−1‖1≤(1+2⋅ω),‖E2E1‖∞=‖(E2E1)−1‖∞≤3,‖AE2‖1≤(1+α)(1+2⋅ω). | (4.3) |
Proof. In the 1-step elimination procedure, we carry out the operation row j:=row j−row i only if it decreases the density of row j. In both elimination steps, this operation can be carried out at most once on row j. In the 2-step elimination algorithm, the operation row j:=row j+row i sets row j back to its state before 1-step elimination, so it will actually increase the density of row j. On the other hand, the operation row j:=row j−row i in the 2-step elimination will also increase the density of row j because the zero elements reduced by the 1-step elimination will change to negative values, the elements with negative values will be modified to their doubleness, and the remaining elements will remain unchanged. As a result, neither of these two operations will be implemented in the 2-step elimination. In summary, if the row j is reduced using the reference row i in the first stage of elimination, it will no longer be reduced using row i in the 2-step elimination process.
Therefore matrix E2E1 has the following form: the diagonal elements are 1, the off-diagonal elements are either 1 or -1, the number of nonzero elements in each column is less than 1+2ω because a reference row may reduce at most ω rows in each elimination process. Thus ‖E2E1‖1≤1+2ω, and ‖AE2‖1≤‖A‖1‖E2E1‖1≤(1+α)(1+2ω).
Besides, as a row may be reduced both in the 1-step and 2-step elimination, the number of non-zero elements in each row of E2E1 is at most 3 including the diagonal element. We conclude that ‖E2E1‖∞≤3.
By reverting the operations performed during the elimination, namely adding or subtracting each reference row from its reduced rows, the final eliminated system can be transformed back to the original system. As a result, changing the values of the off-diagonal elements (1 and −1) in E2E1 to their opposite number yields (E2E1)−1. Thus ‖(E2E1)−1‖1=‖E2E1‖1≤2⋅ω+1, ‖(E2E1)−1‖∞=‖E2E1‖∞≤3.
Lemma 4.1. [46] If matrix C∈Rn×n is strictly diagonally dominant by columns, denoted by
δ=mink(|ckk|−∑j≠k|cjk|), |
there must be ||C−1||1≤1/δ.
Theorem 4.3. Let λE2 be an eigenvalue of AE2, then
1−α2⋅ω+1≤|λE2|≤(1+α)(2⋅ω+1). |
Proof. According to Theorem 4.2, |λE2|≤ρ(AE2)≤‖AE2‖1≤(1+α)(1+2⋅ω). As A is diagonally dominant by columns, therefore ‖A−1‖1≤1mink(|Akk|−∑j≠k|Ajk|)≤11−α, according to Lemma 4.1. Then we get ‖A−1E2‖1≤‖(E2E1)−1‖1‖A−1‖1≤1+2⋅ω1−α and, consequently, |λE2|≥1−α1+2⋅ω.
For the case where matrix A=I−αP is symmetric, we derive stricter bounds for the eigenvalues of AE through the following lemma and theorem, where ρ(⋅) denotes the spectral radius.
Lemma 4.2. [47] If matrix A is normal,
ρ(XA)≤ρ(A)‖X‖andρ(AY)≤ρ(A)‖Y‖, |
for any matrices X and Y.
Proof. Because ‖⋅‖ is an induced matrix norm, we can write ρ(XA)≤‖XA‖≤‖A‖‖X‖ and ρ(AY)≤‖AY‖≤‖A‖‖Y‖. As A is normal, ‖A‖=ρ(A). Therefore, the inequalities are demonstrated.
Theorem 4.4. If the PageRank coefficient matrix A=I−αP is symmetric, and λE2 is an eigenvalue of AE2, then
1−α√3(1+2ω)≤|λE2|≤√3(1+2ω)(1+α). |
Proof. It is clear that symmetric matrices A and A−1 are normal. According to Lemma 4.2, we get
|λE2|≤ρ(AE2)≤‖E2E1‖ρ(A)≤√‖E2E1‖1‖E2E1‖∞≤√3(1+2⋅ω)(1+α), |
ρ(A−1E2)=ρ(A−1(E2E1)−1)≤ρ(A−1)‖(E2E1)−1‖≤√‖(E2E1)−1‖1‖(E2E1)−1‖∞ρ(A−1)≤√3(1+2ω)1−α. |
Consequently, |λE2|≥1ρ(A−1E2)≥1−α√3(1+2ω).
Note that all of the above analyses provide bounds based on the worst-case scenarios. Our numerical experiments (even with non-symmetric A) show that the bounds on the spectrum radius of AE2 stated in the preceding theorem may be loose in practice. Indeed, the spectral distribution of AE2 can be expected to be only marginally less favourable than those of A for the iterative solution. In addition, efficient preconditioners (such as incomplete LU factorizations) can be used to effectively enhance the spectral distribution at a moderate cost, while still benefiting from the decrease in density caused by elimination, as shown in [39].
In order to reduce the computational complexity of the PageRank problem, we propose integrating elimination strategies with lumping reordering techniques. The combination should be carried out as follows:
(1) first, the PageRank linear system is reordered to produce a more compact "kernel linear system" (I−α˜P11)˜x1=b1 where the matrix ˜P11 and the coefficient matrix (I−α˜P11) inherit Properties 1-2;
(2) second, the elimination technique is applied to the "kernel linear system" to generate a reduced (i.e., more sparse) system E(I−α˜P11)˜x1=Eb1;
(3) then, the eliminated "kernel linear system" is solved for ˜x1 iteratively;
(4) finally, using ˜x1, the PageRank values of the remaining nodes excluding ˜x1 are computed by a few vector-vector and matrix-vector operations.
Note that the theoretical results in this section related with the values of matrix norms and the modules of eigenvalues also hold when replacing the PageRank linear system (1.4) with the "kernel linear system" (3.6), and because I−αP and I−α˜P11 are both strictly diagonally dominant by columns, P and ˜P11 are both non-negative with 1-norm values no larger than 1.
We assess and compare the effectiveness of the previously discussed reordering and elimination strategies for the solution of a large number of PageRank linear systems originating from various fields. All of the tests are implemented using MATLAB R2022a on a Windows 10 computer with an AMD Ryzen7 4800u CPU and 16 GB RAM. The selected networks represented by binary matrices are downloaded from the University of Florida matrix repository [48] and the Web Algorithmics Laboratory [49,50,51]. We report on their characteristics in Table 1, where the matrix problems are classified according to their group name in the matrix repositiory, num represents the number of matrices tested in the group, n is the average dimension, nnz denotes the average number of non-zero entries per matrix and den=nnz/n2 denotes the average density. In all of our experiments, the personalization vector v of the PageRank problem and the initial guess x0 are v=[1,1,⋯,1]T/n and x0=v, respectively.
Name | num | n | nnz | den |
SNAP | 25 | 2,502,735 | 8,539,049 | 1.36×10−6 |
LAW | 9 | 10,416,785 | 228,858,878 | 2.11×10−6 |
Pajek | 10 | 386,009 | 1,572,415 | 1.06×10−5 |
cit | 3 | 1,279,028 | 5,764,444 | 3.52×10−6 |
Gleich | 6 | 3,666,739 | 34,738,258 | 2.58×10−6 |
Kamvar | 2 | 482,675 | 4,947,937 | 2.12×10−5 |
Performances of the lumping algorithms
We test the Lump-2, Lump-3, Lump-5 and its recursive version Lump-R5 algorithms. We vary the number of recursive steps for the recursive orderings from 1 to 7, and we record the result of 7 recursive iterations reporting two important metrics for each run: Dratio represents the ratio between the dimension of the final "kernel linear system" corresponding to the upper left block and the dimension of the original adjacency matrix; CPU is the elapsed CPU time cost (in seconds) required to execute the lumping algorithm on our computer. In Table 2, we present the mean value of these two metrics for each group of matrices.
Methods | Lump-2 | Lump-3 | Lump-R | Lump-5 | Lump-R5 | |||||
Groups | Dratio | CPU | Dratio | CPU | Dratio | CPU | Dratio | CPU | Dratio | CPU |
SNAP | 66.4% | 0.03 | 63.9% | 0.13 | 63.5% | 0.63 | 56.5% | 0.17 | 55.7% | 1.42 |
LAW | 81.6% | 0.27 | 79.7% | 3.24 | 79.3% | 10.56 | 78.0% | 3.42 | 77.3% | 26.70 |
Pajek | 74.78% | 0.04 | 67.01% | 0.10 | 54.17% | 0.32 | 57.76% | 0.15 | 38.58% | 0.30 |
cit | 79.56% | 0.07 | 72.44% | 0.21 | 57.13% | 0.64 | 57.86% | 0.28 | 31.92% | 0.59 |
Gleich | 91.42% | 0.34 | 90.74% | 1.40 | 90.56% | 7.00 | 67.99% | 1.89 | 66.41% | 15.29 |
Kamvar | 91.42% | 0.02 | 89.48% | 0.06 | 89.32% | 0.36 | 89.11% | 0.11 | 88.78% | 0.92 |
Table 2 shows that Lump-R>Lump-3>Lump-2 in terms of their ability to reduce the problem dimension. It should be noted that Lump-2, Lump-3 and Lump-R can be viewed as instances of Lump-R using 1, 2 and 7 recursive steps, respectively. Therefore, these results are consistent with theory and our expectations. The "kernel system" produced by Lump-2 is significantly smaller than the original one for SNAP, LAW, Pajek and cit groups, while Lump-3 never significantly reduces the problem dimension further than Lump-2. In these categories of networks, the proportion of dangling nodes is important, whereas the proportion of weakly dangling nodes is modest. Lump-5 outperforms Lump-3 at a modest CPU time increase, and Lump-R5 outperforms Lump-5. Lump-5 is clearly more effective than Lump-3 at reducing problem size for SNAP, Pajek, cit and Gleich groups, whereas Lump-R outperforms Lump-3 for Pajek and cit groups only. Dangling nodes in these tests have a weak recursive structure, and the proportion of unreferenced nodes is not small (although it is smaller than that of dangling nodes). Therefore, excluding the unreferenced nodes provides greater advantages than recursively excluding the dangling nodes. Lump-R5 significantly reduces the problem size further than Lump-5 for groups Pajek and cit. In comparison to the little benefit of only recursively removing dangling nodes, also recursively excluding the unreferenced nodes brings some improvements on reducing the problem dimension. As a result, we can say that the unreferenced nodes in these networks have some recursive structure. Another interesting observation is that, for the recursive-type methods Lump-R and Lump-R5, the recursive process significantly increases the time cost when it has little effect on reducing the problem dimension, whereas this increase is much gentler when it has a significant effect on reducing the problem dimension. This suggests the strategy of detecting the dimension of the upper-left linear system at each step of the recursive process; if it is not significantly reduced relative to the input matrix at this step, the permutation will not be implemented, and the recursive process should terminate.
We conclude, based on the results of this experiment, that the recursive 5-type reordering method outperforms the other lumping algorithms for our PageRank problems.
Performances of elimination algorithms
This section tests the combined effect of the proposed two-stage elimination strategy with the 5-type lumping reordering in reducing the number of non-zero elements of the PageRank linear system. We assess and compare the single-stage elimination strategy (SSES) from [39], the two-stage elimination strategy (TSES) proposed by this paper, and both elimination methods executed on the "kernel linear system" of the five-type reordering technique (called 5-re-SSES and 5-re-TSES, respectively). The latter approach is implemented in two distinct stages: first, the transition matrix P is permuted using 5-type lumping, and then elimination is applied to the upper-left block. The parameters in the elimination algorithms are set as θ=0.3 and ω=100 as suggested in [39].
The results are presented in Table 3, where NNZr represents the ratio of the amount of non-zero coefficients in the eliminated "kernel linear system" to that of the original PageRank system, and CPU represents the total elapsed CPU time cost (in seconds) required by lumping plus elimination on our computer.
Methods | SSES | TSES | 5-re-SSES | 5-re-TSES | ||||
Problems | NNZr | CPU | NNZr | CPU | NNZr | CPU | NNZr | CPU |
cnr-2000 | 52.9% | 0.77 | 48.8% | 2.58 | 46.6% | 0.74 | 42.6% | 1.98 |
eu-2005 | 53.5% | 4.32 | 43.5% | 9.27 | 50.8% | 3.76 | 41.6% | 8.00 |
in-2004 | 42.0% | 3.43 | 36.7% | 11.34 | 37.2% | 3.43 | 32.4% | 9.40 |
indochina-2004 | 33.3% | 59.90 | 23.8% | 96.32 | 30.9% | 73.01 | 21.5% | 102.75 |
web-BerkStan | 49.7% | 1.78 | 46.1% | 5.60 | 44.6% | 2.11 | 41.9% | 5.71 |
web-Stanford | 67.5% | 1.14 | 63.9% | 2.89 | 63.0% | 1.29 | 59.9% | 2.99 |
web-NotreDame | 66.6% | 1.46 | 63.5% | 3.80 | 40.5% | 0.56 | 38.6% | 1.41 |
Stanford | 78.5% | 14.11 | 74.8% | 22.78 | 73.5% | 13.50 | 70.2% | 21.56 |
Stanford-Berkeley | 55.7% | 28.20 | 46.4% | 40.40 | 50.9% | 27.10 | 42.7% | 38.29 |
patents | 99.9% | 21.09 | 99.9% | 47.25 | 34.1% | 8.19 | 34.1% | 16.20 |
EVA | 100.0% | 0.07 | 99.8% | 0.12 | 1.6% | 0.006 | 1.6% | 0.007 |
uk-2007-100000 | 45.9% | 0.36 | 39.1% | 0.80 | 44.9% | 0.51 | 38.0% | 0.92 |
Table 3 shows that the TSES strategy presented in this work is more effective than SSES at reducing the linear system density by 4.58% to 28.53%. The reason for this is that TSES can make further use of the negative values in the matrix, and continue elimination. Similarly, 5-re-TSES outperforms 5-re-SSES. Meanwhile, 5-re-SSES and 5-re-TSES reduce problem density better than SSES and TSES, respectively. The reason for this is that the top left corner block produced by the Lump-5 method contains fewer non-zero elements and is generally denser than the original matrix. Among the tested methods, the 5-re-TSES outperforms the other methods at reducing problem density, as expected. The resulting "kernel linear system" has only 1.6% to 70.6% of the initial number of non-zeros in the original system. For 10 of the 12 studied problems, 5-re-TSES decreases the density by more than 50%, while TSES does this for 7 of the 12 problems, and SSES only for 4 of them. We can conclude from our results that the networks analysed exhibit a high degree of repetitive row sparsity patterns and the two-stage elimination strategy is very effective to exploit this structural property, especially when used in combination with the 5-type reordering.
In terms of computational efficiency, 5-re-SSES and 5-re-TSES have lower CPU time costs than their counterparts SSES and TSES for two-thirds of the problems. For some adjacency matrices with a modest proportion of S&R nodes, the left-top block is small, and the elimination algorithm is fast since it traverses fewer rows. Furthermore, 5-re-SSES and 5-re-TSES clearly outperform SSES and TSES on the matrices "patents" and "EVA", because repetitive row sparsity patterns are not important in these matrices, but the proportion of S&R nodes is small, and as a result the problem density is decreased primarily with the help of the 5-type reordering approach rather than elimination procedures.
We conclude that the 5-re-SSES method is effective at reducing the density of the PageRank problems studied, and it can be efficiently implemented. It should be noted that the time cost of this pre-processing process, which includes the reordering and elimination stages, may be amortized during the solution, particularly when dealing with sequences of PageRank problems with multiple damping factors or multiple personalization vectors since it is performed only once.
Performances on accelerating PageRank computations
This section examines the performance of SSES, TSES, 5-re-SSES, 5-re-TSES and Lump-5 methods on accelerating the convergence of Krylov subspace methods for solving the initial PageRank problem. We solve the linear system corresponding to the left-top block with or without elimination strategies using the ILU-preconditioned GMRES solver [52]. The ILU factorization and iterative solution are implemented by the ILUPACK package [53]. Note that ILPUACK can be only used in Linux-based operating systems. Each iteration begins with the zero vector x0=0, the maximum dimension of the Krylov subspace is m=10, and the iterations are terminated when the approximate solution xi satisfies ‖(xi−αPxi)−v‖2‖v‖2<10−8 or the number of restarts reaches 1000. The memory costs are quantified as
mem={nnz(LUA)+nnz(A)nnz(A)Lump-5,nnz(LUAE)+nnz(AE)nnz(A)the eliminated system by SSES,nnz(LUATE)+nnz(ATE)−nnz(AT)+nnz(A)nnz(A)the eliminated system by 5-re-SSES,nnz(LUATE2)+nnz(ATE2)−nnz(AT)+nnz(A)nnz(A)the eliminated system by 5-re-TSES, | (5.1) |
where symbol nnz(LUA) represents the number of the non-zeros in all the multilevel ILU factors of A, while AT denotes the coefficient matrix of the upper-left "kernel system" after 5-type lumping, ATE and ATE2 denote the eliminated matrix of AT by the SSES strategy and the TSES strategy, respectively. For fair comparisons, the parameters of elimination algorithms are tuned to get good performance. We tune the 'droptol' parameter that affects ILUPACK accuracy to ensure that the memory costs of the various tested solvers are comparable. However, methods that incorporate elimination reduce memory costs to such a great extent that other methods often cannot guarantee convergence at the same memory footprint. Therefore, the elimination cases are tuned to have the same memory cost, the non-elimination cases are tuned to have another level of memory cost.
The comparison results are presented in Table 4, where Tp, Tf and Ts denote the CPU time (in seconds) needed by the pre-processing including reordering and/or elimination, the factorization step and the Krylov subspace solver, respectively, and Ttotal denotes the total CPU time. Symbol '-' represents no corresponding metric value.
Problem & Method | Dratio | NNZr | Mem | Tp | Tf | Ts | Ttotal |
cnr-2000 | |||||||
ILUPACK SSES Lump-5 5-re-SSES 5-re-TSES |
100.0% 100.0% 74.2% 74.2% 74.2% |
100.0% 52.9% 90.8% 46.6% 42.6% |
2.22 1.60 2.13 1.61 1.60 |
- 0.95 0.13 0.50 1.06 |
8.92 2.80 4.21 2.20 2.06 |
0.60 0.27 0.36 0.07 0.05 |
9.52 4.03 4.70 2.76 3.17 |
web-BerkStan | |||||||
ILUPACK SSES Lump-5 5-re-SSES 5-re-TSES |
100.0% 100.0% 89.2% 89.2% 89.2% |
100.0% 49.7% 86.9% 44.6% 41.9% |
2.41 1.59 2.39 1.59 1.61 |
- 1.72 0.32 1.65 3.49 |
26.49 9.84 21.12 7.87 8.13 |
4.96 0.32 0.61 0.31 0.24 |
31.44 11.89 22.06 9.83 11.85 |
web-Stanford | |||||||
ILUPACK SSES Lump-5 5-re-SSES 5-re-TSES |
100.0% 100.0% 92.7% 92.7% 92.7% |
100.0% 67.5% 92.5% 63.0% 59.9% |
2.02 1.60 1.99 1.62 1.60 |
- 1.26 0.15 1.21 1.81 |
5.68 3.07 3.56 2.94 2.68 |
4.06 2.13 1.61 1.19 0.39 |
9.74 6.46 5.32 5.34 4.88 |
eu-2005 | |||||||
ILUPACK SSES Lump-5 5-re-SSES 5-re-TSES |
100.0% 100.0% 91.2% 91.2% 91.2% |
100.0% 53.5% 93.9% 50.8% 41.6% |
2.04 1.62 2.02 1.59 1.61 |
- 3.78 0.79 3.86 6.19 |
53.52 44.77 75.00 37.54 36.26 |
15.76 1.51 7.14 1.47 1.40 |
69.29 50.06 82.93 42.87 43.85 |
in-2004 | |||||||
ILUPACK SSES Lump-5 5-re-SSES 5-re-TSES |
100.0% 100.0% 77.9% 77.9% 77.9% |
100.0% 42.0% 92.2% 37.2% 32.4% |
2.00 1.23 2.01 1.24 1.00 |
- 3.25 0.74 2.60 5.71 |
25.31 12.76 21.12 9.87 8.06 |
15.54 0.69 2.32 0.36 0.45 |
40.85 16.70 24.18 12.84 14.23 |
uk-2007-100000 | |||||||
ILUPACK SSES Lump-5 5-re-SSES 5-re-TSES |
100.0% 100.0% 94.6% 94.6% 94.6% |
100.0% 45.9% 98.0% 44.9% 38.0% |
2.04 1.60 2.12 1.59 1.60 |
- 0.28 0.11 0.28 0.33 |
8.19 6.31 7.82 5.77 4.05 |
0.72 0.20 0.71 0.18 0.09 |
8.91 6.79 8.63 6.23 4.47 |
As shown in Table 4, SSES and Lump-5 can effectively lower the factorization time Tf and the solving process time Ts, with the preprocessing process adding a small amount of additional time cost Tp. The reason is clearly that the quantity of non-zero values and/or the size of the "kernel system" have been reduced. These two methods increase the total computational efficiency with decreased total time costs, Ttotal as a result. The SSES method outperforms the Lump-5 method because it reduces Tf, Ts and Ttotal by a greater amount. When both are combined, the proposed 5-re-SSES method can further reduce the factorization time Tf and the solving process time Ts and achieve a higher level of computational efficiency with a lower total time cost Ttotal.
Compared with the 5-re-SSES method, the use of the proposed two-stage elimination strategy generally further decreases the factorization time and the solving time. However, because the two-stage elimination strategy may significantly increase the pre-processing time compared to the one-stage elimination process (note, however, that the code has not been fully optimized), the resulting 5-re-TSES method does not outperform the 5-re-SSES method in terms of total time costs. As stated previously, the pre-processing procedure can be implemented only once and used to solve multiple PageRank problems on the same network graph. In such situations, the solution time Ts becomes the most important metric, followed by the factorization time Tf. We conclude that 5-re-TSES will be preferable in terms of time consumption for solving multiple PageRank problems on the same network graph, whereas 5-re-SSES should be preferred for solving a single PageRank linear system. In terms of memory requirements, TSES-type methods are more efficient that SSES-type methods.
For the solution of large PageRank models, in this paper we have described a recursive 5-type lumping algorithm combined with a two-stage elimination strategy that integrates information about the nonzero structure of the underlying network and the nonzero values of the PageRank coefficient matrix to reduce the dimension and the density of the relevant PageRank system to solve. Numerical experiments on over 50 real-world networks demonstrate that 1) real networks often have deeper structural characteristics than those already reported in the literature, and these can be used for faster PageRank computations; 2) the structural properties of data networks tend to be related to their application backgrounds; and 3) the proposed methods can exploit these properties effectively and have the potential to decrease significantly the costs of PageRank solutions especially for the case of large and/or multiple damping factors.
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
This research is supported by the National Natural Science Foundation of China (12101433), the Sichuan Science and Technology Program (2023NSFSC1136) and the Two-Way Support Programs of Sichuan Agricultural University, China (1921993077). Bruno Carpentieri belong to the Gruppo Nazionale per il Calcolo Scientifico (GNCS) of the Istituto Nazionale di Alta Matematica (INdAM) and this work was partially supported by INdAM-GNCS under Progetti di Ricerca 2022. Furthermore Bruno Carpentieri was funded by Provincia autonoma di Bolzano/Alto Adige - Ripartizione Innovazione, Ricerca, Universitˊae Musei (contract nr. 19/34).
The authors declare no conflict of interest.
[1] |
S. Brin, L. Page, The anatomy of a large-scale hypertextual web search engine, Comput. Netw. ISDN Syst., 30 (1998), 107–117. https://doi.org/10.1016/S0169-7552(98)00110-X doi: 10.1016/S0169-7552(98)00110-X
![]() |
[2] |
T. Zhou, E. Martinez-Baez, G. Schenter, A. E. Clark, PageRank as a collective variable to study complex chemical transformations and their energy landscapes, J. Chem. Phys., 150 (2019), 134102. https://doi.org/10.1063/1.5082648 doi: 10.1063/1.5082648
![]() |
[3] |
B. Liu, S. Jiang, Q. Zou, Hits-pr-hhblits: Protein remote homology detection by combining pagerank and hyperlink-induced topic search, Brief. Bioinformatics, 21 (2020), 298–308. https://doi.org/10.1093/bib/bby104 doi: 10.1093/bib/bby104
![]() |
[4] |
M. Rafiei, A. A. Kardan, A novel method for expert finding in online communities based on concept map and pagerank, Hum. Cent. Comput. Inf. Sci., 5 (2015), 10. https://doi.org/10.1186/s13673-015-0030-5 doi: 10.1186/s13673-015-0030-5
![]() |
[5] |
F. A. Massucci, D. Docampo, Measuring the academic reputation through citation networks via pagerank, J. Informetr., 13 (2019), 185–201. https://doi.org/10.1016/j.joi.2018.12.001 doi: 10.1016/j.joi.2018.12.001
![]() |
[6] | M. Zhang, X. Li, L. Zhang, S. Khurshid, Boosting spectrum-based fault localization using Pagerank, In: Proceedings of the 26th ACM SIGSOFT international symposium on software testing and analysis, 2017,261–272. https://doi.org/10.1145/3092703.3092731 |
[7] | A. Bojchevski, J. Gasteiger, B. Perozzi, A. Kapoor, M. Blais, B. Rózemberczki, et al., Scaling graph neural networks with approximate pagerank, In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, 2020, 2464–2473. https://doi.org/10.1145/3394486.3403296 |
[8] | E. Chien, J. Peng, P. Li, O. Milenkovic, Adaptive universal generalized pagerank graph neural network, arXiv preprint, 2020. https://doi.org/10.48550/arXiv.2006.07988 |
[9] | A. Roth, T. Liebig, Transforming pagerank into an infinite-depth graph neural network, In: Joint European conference on machine learning and knowledge discovery in databases, 2022,469–484. https://doi.org/10.1007/978-3-031-26390-3_27 |
[10] | D. F. Gleich, PageRank beyond the web, SIAM Rev., 57 (2015), 321–363. https://doi.org/10.1137/140976649 |
[11] |
R. A. Horn, S. Serra-Capizzano, A general setting for the parametric Google matrix, Internet Math., 3 (2008), 385–411. https://doi.org/10.1080/15427951.2006.10129131 doi: 10.1080/15427951.2006.10129131
![]() |
[12] |
S. Serra-Capizzano, Jordan canonical form of the Google matrix: A potential contribution to the PageRank computation, SIAM J. Matrix Anal. Appl., 27 (2005), 305–312. https://doi.org/10.1137/S0895479804441407 doi: 10.1137/S0895479804441407
![]() |
[13] | A. Langville, C. Meyer, Google's PageRank and beyond: The science of search engine rankings, Princeton: Princeton University Press, 2006. https://doi.org/10.1515/9781400830329 |
[14] | P. G. Constantine, D. F. Gleich, Random alpha PageRank, Internet Math., 6 (2009), 189–236. https://doi.org/10.1080/15427951.2009.10129185 |
[15] | S. D. Kamvar, T. H. Haveliwala, C. D. Manning, G. H. Golub, Extrapolation methods for accelerating PageRank computation, In: Proceedings of the 12th international conference on World Wide Web, (2003), 261–270. https://doi.org/10.1145/775152.775190 |
[16] |
X. Tan, A new extrapolation method for PageRank computations, J. Comput. Appl. Math., 313 (2017), 383–392. https://doi.org/10.1016/j.cam.2016.08.034 doi: 10.1016/j.cam.2016.08.034
![]() |
[17] |
C. Brezinski, M. Redivo-Zaglia, S. Serra-Capizzano, Extrapolation methods for PageRank computations, CR Math., 340 (2005), 393–397. https://doi.org/10.1016/j.crma.2005.01.015 doi: 10.1016/j.crma.2005.01.015
![]() |
[18] |
A. Cicone, S. Serra-Capizzano, Google PageRanking problem: The model and the analysis, J. Comput. Appl. Math., 234 (2010), 3140–3169. https://doi.org/10.1016/j.cam.2010.02.005 doi: 10.1016/j.cam.2010.02.005
![]() |
[19] |
S. D. Kamvar, T. H. Haveliwala, G. H. Golub, Adaptive methods for the computation of the PageRank, Linear Algebra Appl., 386 (2004), 51–65. https://doi.org/10.1016/j.laa.2003.12.008 doi: 10.1016/j.laa.2003.12.008
![]() |
[20] |
H. D. Sterck, T. A. Manteuffel, S. F. McCormick, Q. Nguyen, J. Ruge, Multilevel adaptive aggregation for Markov chains, with application to web ranking, SIAM J. Sci. Comput., 30 (2008), 2235–2262. https://doi.org/10.1137/070685142 doi: 10.1137/070685142
![]() |
[21] |
Z. L. Shen, T. Z. Huang, B. Carpentieri, C. Wen, X. M. Gu, Block-accelerated aggregation multigrid for Markov chains with application to PageRank problems, Commun. Nonlinear Sci. Numer. Simul., 59 (2018), 472–487. https://doi.org/10.1016/j.cnsns.2017.11.031 doi: 10.1016/j.cnsns.2017.11.031
![]() |
[22] |
D. F. Gleich, A. P. Gray, C. Greif, T. Lau, An inner-outer iteration for computing PageRank, SIAM J. Sci. Comput., 32 (2010), 349–371. https://doi.org/10.1137/080727397 doi: 10.1137/080727397
![]() |
[23] |
C. Q. Gu, F. Xie, K. Zhang, A two-step matrix splitting iteration for computing PageRank, J. Comput. Appl. Math., 278 (2015), 19–28. https://doi.org/10.1016/j.cam.2014.09.022 doi: 10.1016/j.cam.2014.09.022
![]() |
[24] |
C. Wen, T. Z. Huang, Z. L. Shen, A note on the two-step matrix splitting iteration for computing PageRank, J. Comput. Appl. Math., 315 (2017), 87–97. https://doi.org/10.1016/j.cam.2016.10.020 doi: 10.1016/j.cam.2016.10.020
![]() |
[25] |
Z. L. Tian, Y. Liu, Y. Zhang, Z. Y. Liu, M. Y. Tian, The general inner-outer iteration method based on regular splittings for the PageRank problem, Appl. Math. Comput., 356 (2019), 479–501. https://doi.org/10.1016/j.amc.2019.02.066 doi: 10.1016/j.amc.2019.02.066
![]() |
[26] |
M. Y. Tian, Y. Zhang, Y. D. Wang, A general multi-splitting iteration method for computing PageRank, Comput. Appl. Math., 38 (2019), 1–29. https://doi.org/10.1007/s40314-019-0830-8 doi: 10.1007/s40314-019-0830-8
![]() |
[27] |
G. H. Golub, C. Greif, An Arnoldi-type algorithm for computing pagerank, BIT Numer. Math., 46 (2006), 759–771. https://doi.org/10.1007/s10543-006-0091-y doi: 10.1007/s10543-006-0091-y
![]() |
[28] |
J. F. Yin, G. J. Yin, M. Ng, On adaptively accelerated Arnoldi method for computing PageRank, Numer. Linear Algebra Appl., 19 (2012), 73–85. https://doi.org/10.1002/nla.789 doi: 10.1002/nla.789
![]() |
[29] |
Z. L. Shen, H. Yang, B. Carpentieri, X. M. Gu, C. Wen, A preconditioned variant of the refined arnoldi method for computing PageRank eigenvectors, Symmetry, 13 (2021), 1327. https://doi.org/10.3390/sym13081327 doi: 10.3390/sym13081327
![]() |
[30] |
H. F. Zhang, T. Z. Huang, C. Wen, Z. L. Shen, FOM accelerated by an extrapolation method for solving PageRank problems, J. Comput. Appl. Math., 296 (2016), 397–409. https://doi.org/10.1016/j.cam.2015.09.027 doi: 10.1016/j.cam.2015.09.027
![]() |
[31] |
G. Wu, Y. Wei, A power-Arnoldi algorithm for computing pagerank, Numer. Linear Algebra Appl., 14 (2007), 521–546. https://doi.org/10.1002/nla.531 doi: 10.1002/nla.531
![]() |
[32] |
C. Q. Gu, X. L. Jiang, C. C. Shao, Z. B. Chen, A GMRES-Power algorithm for computing PageRank problems, J. Comput. Appl. Math., 343 (2018), 113–123. https://doi.org/10.1016/j.cam.2018.03.017 doi: 10.1016/j.cam.2018.03.017
![]() |
[33] |
Q. Y. Hu, C. Wen, T. Z. Huang, Z. L. Shen, X. M. Gu, A variant of the Power-Arnoldi algorithm for computing PageRank, J. Comput. Appl. Math., 381 (2021), 113034. https://doi.org/10.1016/j.cam.2020.113034 doi: 10.1016/j.cam.2020.113034
![]() |
[34] |
C. Q. Gu, W. W. Wang, An Arnoldi-Inout algorithm for computing PageRank problems, J. Comput. Appl. Math., 309 (2017), 219–229. https://doi.org/10.1016/j.cam.2016.05.026 doi: 10.1016/j.cam.2016.05.026
![]() |
[35] | D. F. Gleich, L. Zhukov, P. Berkhin, Fast parallel pagerank: A linear system approach, 2005. |
[36] |
Y. Lin, X. Shi, Y. Wei, On computing PageRank via lumping the Google matrix, J. Comput. Appl. Math., 224 (2009), 702–708. https://doi.org/10.1016/j.cam.2008.06.003 doi: 10.1016/j.cam.2008.06.003
![]() |
[37] |
Q. Yu, Z. Miao, G. Wu, Y. Wei, Lumping algorithms for computing Google's PageRank and its derivative, with attention to unreferenced nodes, Inf. Retr., 15 (2012), 503–526. https://doi.org/10.1007/s10791-012-9183-2 doi: 10.1007/s10791-012-9183-2
![]() |
[38] |
A. N. Langville, C. D. Meyer, A reordering for the PageRank problem, SIAM J. Sci. Comput., 27 (2006), 2112–2120. https://doi.org/10.1137/040607551 doi: 10.1137/040607551
![]() |
[39] |
Z. L. Shen, T. Z. Huang, B. Carpentieri, X. M. Gu, C. Wen, An efficient elimination strategy for solving PageRank problems, Appl. Math. Comput., 298 (2017), 111–122. https://doi.org/10.1016/j.amc.2016.10.031 doi: 10.1016/j.amc.2016.10.031
![]() |
[40] |
Z. L. Shen, T. Z. Huang, B. Carpentieri, C. Wen, X. M. Gu, X. Y. Tan, Off-diagonal low-rank preconditioner for difficult PageRank problems, J. Comput. Appl. Math., 346 (2019), 456–470. https://doi.org/10.1016/j.cam.2018.07.015 doi: 10.1016/j.cam.2018.07.015
![]() |
[41] | Z. L. Shen, B. Carpentieri, Multi-Step Low-Rank Decomposition of Large PageRank Matrices, In: The 7th international conference on fuzzy systems and data mining, 340 (2021), 397–404. https://doi.org/10.3233/FAIA210212 |
[42] | D. J. Higham, N. J. Higham, MATLAB guide, SIAM press, 2016. |
[43] | C. P. Lee, G. H. Golub, S. A. Zenios, Partial state space aggregation based on lumpability and its application to PageRank, Tech. Rep. Stanford Univ., 2003. |
[44] | S. D. Kamvar, T. H. Haveliwala, C. D. Manning, G. H. Goloub, Exploiting the block structure of the web for computing PageRank, Tech. Rep. Stanford Univ., 2003. |
[45] | A. Scime, Web mining: Applications and techniques, IGI Global Press, 2005. https://doi.org/10.4018/978-1-59140-414-9 |
[46] |
Y. P. Hong, C. T. Pan, A lower bound for the smallest singular value, Linear Algebra Appl., 172 (1992), 27–32. https://doi.org/10.1016/0024-3795(92)90016-4 doi: 10.1016/0024-3795(92)90016-4
![]() |
[47] |
O. Axelsson, M. Neytcheva, A general approach to analyse preconditioners for two-by-two block matrices, Numer. Linear Algebra Appl., 20 (2013), 723–742. https://doi.org/10.1002/nla.830 doi: 10.1002/nla.830
![]() |
[48] | T. A. Davis, Y. Hu, The University of Florida sparse matrix collection, ACM Trans. Math. Softw., 38 (2011), 1–25. |
[49] | P. Boldi, S. Vigna, The webgraph framework I: Compression techniques, In: Proceedings of the 13th international conference on World Wide Web, 2004,595–602. https://doi.org/10.1145/988672.988752 |
[50] | P. Boldi, M. Rosa, M. Santini, S. Vigna, Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks, In: Proceedings of the 20th international conference on World Wide Web, 2011,587–596. https://doi.org/10.1145/1963405.1963488 |
[51] |
P. Boldi, B. Codenotti, M. Santini, S. Vigna, Ubicrawler: A scalable fully distributed Web crawler, Softw. Pract. Exp., 34 (2004), 711–726. https://doi.org/10.1002/spe.587 doi: 10.1002/spe.587
![]() |
[52] |
Y. Saad, M. H. Schultz, GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems, SIAM J. Sci. Comput., 7 (1986), 856–869. https://doi.org/10.1137/0907058 doi: 10.1137/0907058
![]() |
[53] | M. Bollhöefer, Y. Saad, O. Schenk, ILUPACK-preconditioning software package, 2010. |
Name | num | n | nnz | den |
SNAP | 25 | 2,502,735 | 8,539,049 | 1.36×10−6 |
LAW | 9 | 10,416,785 | 228,858,878 | 2.11×10−6 |
Pajek | 10 | 386,009 | 1,572,415 | 1.06×10−5 |
cit | 3 | 1,279,028 | 5,764,444 | 3.52×10−6 |
Gleich | 6 | 3,666,739 | 34,738,258 | 2.58×10−6 |
Kamvar | 2 | 482,675 | 4,947,937 | 2.12×10−5 |
Methods | Lump-2 | Lump-3 | Lump-R | Lump-5 | Lump-R5 | |||||
Groups | Dratio | CPU | Dratio | CPU | Dratio | CPU | Dratio | CPU | Dratio | CPU |
SNAP | 66.4% | 0.03 | 63.9% | 0.13 | 63.5% | 0.63 | 56.5% | 0.17 | 55.7% | 1.42 |
LAW | 81.6% | 0.27 | 79.7% | 3.24 | 79.3% | 10.56 | 78.0% | 3.42 | 77.3% | 26.70 |
Pajek | 74.78% | 0.04 | 67.01% | 0.10 | 54.17% | 0.32 | 57.76% | 0.15 | 38.58% | 0.30 |
cit | 79.56% | 0.07 | 72.44% | 0.21 | 57.13% | 0.64 | 57.86% | 0.28 | 31.92% | 0.59 |
Gleich | 91.42% | 0.34 | 90.74% | 1.40 | 90.56% | 7.00 | 67.99% | 1.89 | 66.41% | 15.29 |
Kamvar | 91.42% | 0.02 | 89.48% | 0.06 | 89.32% | 0.36 | 89.11% | 0.11 | 88.78% | 0.92 |
Methods | SSES | TSES | 5-re-SSES | 5-re-TSES | ||||
Problems | NNZr | CPU | NNZr | CPU | NNZr | CPU | NNZr | CPU |
cnr-2000 | 52.9% | 0.77 | 48.8% | 2.58 | 46.6% | 0.74 | 42.6% | 1.98 |
eu-2005 | 53.5% | 4.32 | 43.5% | 9.27 | 50.8% | 3.76 | 41.6% | 8.00 |
in-2004 | 42.0% | 3.43 | 36.7% | 11.34 | 37.2% | 3.43 | 32.4% | 9.40 |
indochina-2004 | 33.3% | 59.90 | 23.8% | 96.32 | 30.9% | 73.01 | 21.5% | 102.75 |
web-BerkStan | 49.7% | 1.78 | 46.1% | 5.60 | 44.6% | 2.11 | 41.9% | 5.71 |
web-Stanford | 67.5% | 1.14 | 63.9% | 2.89 | 63.0% | 1.29 | 59.9% | 2.99 |
web-NotreDame | 66.6% | 1.46 | 63.5% | 3.80 | 40.5% | 0.56 | 38.6% | 1.41 |
Stanford | 78.5% | 14.11 | 74.8% | 22.78 | 73.5% | 13.50 | 70.2% | 21.56 |
Stanford-Berkeley | 55.7% | 28.20 | 46.4% | 40.40 | 50.9% | 27.10 | 42.7% | 38.29 |
patents | 99.9% | 21.09 | 99.9% | 47.25 | 34.1% | 8.19 | 34.1% | 16.20 |
EVA | 100.0% | 0.07 | 99.8% | 0.12 | 1.6% | 0.006 | 1.6% | 0.007 |
uk-2007-100000 | 45.9% | 0.36 | 39.1% | 0.80 | 44.9% | 0.51 | 38.0% | 0.92 |
Problem & Method | Dratio | NNZr | Mem | Tp | Tf | Ts | Ttotal |
cnr-2000 | |||||||
ILUPACK SSES Lump-5 5-re-SSES 5-re-TSES |
100.0% 100.0% 74.2% 74.2% 74.2% |
100.0% 52.9% 90.8% 46.6% 42.6% |
2.22 1.60 2.13 1.61 1.60 |
- 0.95 0.13 0.50 1.06 |
8.92 2.80 4.21 2.20 2.06 |
0.60 0.27 0.36 0.07 0.05 |
9.52 4.03 4.70 2.76 3.17 |
web-BerkStan | |||||||
ILUPACK SSES Lump-5 5-re-SSES 5-re-TSES |
100.0% 100.0% 89.2% 89.2% 89.2% |
100.0% 49.7% 86.9% 44.6% 41.9% |
2.41 1.59 2.39 1.59 1.61 |
- 1.72 0.32 1.65 3.49 |
26.49 9.84 21.12 7.87 8.13 |
4.96 0.32 0.61 0.31 0.24 |
31.44 11.89 22.06 9.83 11.85 |
web-Stanford | |||||||
ILUPACK SSES Lump-5 5-re-SSES 5-re-TSES |
100.0% 100.0% 92.7% 92.7% 92.7% |
100.0% 67.5% 92.5% 63.0% 59.9% |
2.02 1.60 1.99 1.62 1.60 |
- 1.26 0.15 1.21 1.81 |
5.68 3.07 3.56 2.94 2.68 |
4.06 2.13 1.61 1.19 0.39 |
9.74 6.46 5.32 5.34 4.88 |
eu-2005 | |||||||
ILUPACK SSES Lump-5 5-re-SSES 5-re-TSES |
100.0% 100.0% 91.2% 91.2% 91.2% |
100.0% 53.5% 93.9% 50.8% 41.6% |
2.04 1.62 2.02 1.59 1.61 |
- 3.78 0.79 3.86 6.19 |
53.52 44.77 75.00 37.54 36.26 |
15.76 1.51 7.14 1.47 1.40 |
69.29 50.06 82.93 42.87 43.85 |
in-2004 | |||||||
ILUPACK SSES Lump-5 5-re-SSES 5-re-TSES |
100.0% 100.0% 77.9% 77.9% 77.9% |
100.0% 42.0% 92.2% 37.2% 32.4% |
2.00 1.23 2.01 1.24 1.00 |
- 3.25 0.74 2.60 5.71 |
25.31 12.76 21.12 9.87 8.06 |
15.54 0.69 2.32 0.36 0.45 |
40.85 16.70 24.18 12.84 14.23 |
uk-2007-100000 | |||||||
ILUPACK SSES Lump-5 5-re-SSES 5-re-TSES |
100.0% 100.0% 94.6% 94.6% 94.6% |
100.0% 45.9% 98.0% 44.9% 38.0% |
2.04 1.60 2.12 1.59 1.60 |
- 0.28 0.11 0.28 0.33 |
8.19 6.31 7.82 5.77 4.05 |
0.72 0.20 0.71 0.18 0.09 |
8.91 6.79 8.63 6.23 4.47 |
Name | num | n | nnz | den |
SNAP | 25 | 2,502,735 | 8,539,049 | 1.36×10−6 |
LAW | 9 | 10,416,785 | 228,858,878 | 2.11×10−6 |
Pajek | 10 | 386,009 | 1,572,415 | 1.06×10−5 |
cit | 3 | 1,279,028 | 5,764,444 | 3.52×10−6 |
Gleich | 6 | 3,666,739 | 34,738,258 | 2.58×10−6 |
Kamvar | 2 | 482,675 | 4,947,937 | 2.12×10−5 |
Methods | Lump-2 | Lump-3 | Lump-R | Lump-5 | Lump-R5 | |||||
Groups | Dratio | CPU | Dratio | CPU | Dratio | CPU | Dratio | CPU | Dratio | CPU |
SNAP | 66.4% | 0.03 | 63.9% | 0.13 | 63.5% | 0.63 | 56.5% | 0.17 | 55.7% | 1.42 |
LAW | 81.6% | 0.27 | 79.7% | 3.24 | 79.3% | 10.56 | 78.0% | 3.42 | 77.3% | 26.70 |
Pajek | 74.78% | 0.04 | 67.01% | 0.10 | 54.17% | 0.32 | 57.76% | 0.15 | 38.58% | 0.30 |
cit | 79.56% | 0.07 | 72.44% | 0.21 | 57.13% | 0.64 | 57.86% | 0.28 | 31.92% | 0.59 |
Gleich | 91.42% | 0.34 | 90.74% | 1.40 | 90.56% | 7.00 | 67.99% | 1.89 | 66.41% | 15.29 |
Kamvar | 91.42% | 0.02 | 89.48% | 0.06 | 89.32% | 0.36 | 89.11% | 0.11 | 88.78% | 0.92 |
Methods | SSES | TSES | 5-re-SSES | 5-re-TSES | ||||
Problems | NNZr | CPU | NNZr | CPU | NNZr | CPU | NNZr | CPU |
cnr-2000 | 52.9% | 0.77 | 48.8% | 2.58 | 46.6% | 0.74 | 42.6% | 1.98 |
eu-2005 | 53.5% | 4.32 | 43.5% | 9.27 | 50.8% | 3.76 | 41.6% | 8.00 |
in-2004 | 42.0% | 3.43 | 36.7% | 11.34 | 37.2% | 3.43 | 32.4% | 9.40 |
indochina-2004 | 33.3% | 59.90 | 23.8% | 96.32 | 30.9% | 73.01 | 21.5% | 102.75 |
web-BerkStan | 49.7% | 1.78 | 46.1% | 5.60 | 44.6% | 2.11 | 41.9% | 5.71 |
web-Stanford | 67.5% | 1.14 | 63.9% | 2.89 | 63.0% | 1.29 | 59.9% | 2.99 |
web-NotreDame | 66.6% | 1.46 | 63.5% | 3.80 | 40.5% | 0.56 | 38.6% | 1.41 |
Stanford | 78.5% | 14.11 | 74.8% | 22.78 | 73.5% | 13.50 | 70.2% | 21.56 |
Stanford-Berkeley | 55.7% | 28.20 | 46.4% | 40.40 | 50.9% | 27.10 | 42.7% | 38.29 |
patents | 99.9% | 21.09 | 99.9% | 47.25 | 34.1% | 8.19 | 34.1% | 16.20 |
EVA | 100.0% | 0.07 | 99.8% | 0.12 | 1.6% | 0.006 | 1.6% | 0.007 |
uk-2007-100000 | 45.9% | 0.36 | 39.1% | 0.80 | 44.9% | 0.51 | 38.0% | 0.92 |
Problem & Method | Dratio | NNZr | Mem | Tp | Tf | Ts | Ttotal |
cnr-2000 | |||||||
ILUPACK SSES Lump-5 5-re-SSES 5-re-TSES |
100.0% 100.0% 74.2% 74.2% 74.2% |
100.0% 52.9% 90.8% 46.6% 42.6% |
2.22 1.60 2.13 1.61 1.60 |
- 0.95 0.13 0.50 1.06 |
8.92 2.80 4.21 2.20 2.06 |
0.60 0.27 0.36 0.07 0.05 |
9.52 4.03 4.70 2.76 3.17 |
web-BerkStan | |||||||
ILUPACK SSES Lump-5 5-re-SSES 5-re-TSES |
100.0% 100.0% 89.2% 89.2% 89.2% |
100.0% 49.7% 86.9% 44.6% 41.9% |
2.41 1.59 2.39 1.59 1.61 |
- 1.72 0.32 1.65 3.49 |
26.49 9.84 21.12 7.87 8.13 |
4.96 0.32 0.61 0.31 0.24 |
31.44 11.89 22.06 9.83 11.85 |
web-Stanford | |||||||
ILUPACK SSES Lump-5 5-re-SSES 5-re-TSES |
100.0% 100.0% 92.7% 92.7% 92.7% |
100.0% 67.5% 92.5% 63.0% 59.9% |
2.02 1.60 1.99 1.62 1.60 |
- 1.26 0.15 1.21 1.81 |
5.68 3.07 3.56 2.94 2.68 |
4.06 2.13 1.61 1.19 0.39 |
9.74 6.46 5.32 5.34 4.88 |
eu-2005 | |||||||
ILUPACK SSES Lump-5 5-re-SSES 5-re-TSES |
100.0% 100.0% 91.2% 91.2% 91.2% |
100.0% 53.5% 93.9% 50.8% 41.6% |
2.04 1.62 2.02 1.59 1.61 |
- 3.78 0.79 3.86 6.19 |
53.52 44.77 75.00 37.54 36.26 |
15.76 1.51 7.14 1.47 1.40 |
69.29 50.06 82.93 42.87 43.85 |
in-2004 | |||||||
ILUPACK SSES Lump-5 5-re-SSES 5-re-TSES |
100.0% 100.0% 77.9% 77.9% 77.9% |
100.0% 42.0% 92.2% 37.2% 32.4% |
2.00 1.23 2.01 1.24 1.00 |
- 3.25 0.74 2.60 5.71 |
25.31 12.76 21.12 9.87 8.06 |
15.54 0.69 2.32 0.36 0.45 |
40.85 16.70 24.18 12.84 14.23 |
uk-2007-100000 | |||||||
ILUPACK SSES Lump-5 5-re-SSES 5-re-TSES |
100.0% 100.0% 94.6% 94.6% 94.6% |
100.0% 45.9% 98.0% 44.9% 38.0% |
2.04 1.60 2.12 1.59 1.60 |
- 0.28 0.11 0.28 0.33 |
8.19 6.31 7.82 5.77 4.05 |
0.72 0.20 0.71 0.18 0.09 |
8.91 6.79 8.63 6.23 4.47 |