1.
Introduction
With the booming development of the internet, the way people obtain information is gradually changing to the web search. Google search engine is one of the most popular and successful search engines, and it uses PageRank to determine the importance of web pages [1].
Let's briefly introduce the mathematical model of PageRank problems. If the link structure of web pages is considered as a directed link graph, then the adjacency matrix P∈Rn×n of the graph can be described as
where ni denotes the number of outlinks of page i. If page i contains no outlinks (page i is called as a dangling node [2]), then the ith row of P will be zero. The existence of dangling nodes is a drawback for computing PageRank. To correct this problem, a rank-1 modification is applied to obtain a row stochastic matrix, that is,
where d=(di)∈Rn×1 with
and w=(wi)∈Rn×1 is a probability distribution vector.
To further guarantee the aperiodicity and irreducibility of ˜P, a convex combination of ˜P is considered by introducing a damping factor α and a personalization vector v, which yields the Google matrix
where α∈(0,1) and v=e/n with e=[1,1,…,1]T∈Rn×1. Usually, we set w=v. The Google matrix A is stochastic and irreducible after two corrections. Finally, the PageRank problem requires to solve the following linear system:
where x is called the unknown PageRank vector. More details about the formulation of PageRank problems, please refer to [3,4] and references therein.
The traditional method for computing PageRank is the power method [5] since it is based on matrix-vector products and requires small memory space. However, when the largest and the second largest eigenvalues of the matrix A can not be well separated, or in other words, when the damping factor α is sufficiently close to 1, the power method performs poorly. To speed up the PageRank computation, extensive studies have been carried out over the past few years, including the extrapolation methods [6,7,8,9], the inner-outer methods [10,11,12,13,14], the adaptive methods [15,16,17] and the multigrid methods [18,19].
In addition, Krylov subspace methods play a powerful role in solving PageRank problems. Golub and Greif proposed an Arnoldi-type algorithm [20] without Ritz value computations by using a relevant shift in the refined Arnoldi method [21] to be 1. Wu and Wei developed a Power-Arnoldi algorithm [22] by periodically knitting the power method with the thick restarted Arnoldi algorithm [23]. Hu et al. proposed a variant of the Power-Arnoldi algorithm [24] by using the power method with the extrapolation process based on trace (PET) [8]. Wu and Wei developed an Arnoldi-Extrapolation algorithm [25] by periodically combining the Arnoldi-type method [20] with the extrapolation method based on Ritz values. Zhang et al. proposed a FOM-Extrapolation algorithm [26] by using the same extrapolation method. Gu et al. developed a GMRES-Power algorithm [27] by periodically knitting the GMRES method [28] with the power method. Yin et al. proposed a generalized Arnoldi (GArnoldi) algorithm [15] by replacing the standard inner products with the weighted inner products during the Arnoldi process. Wen et al. developed a Power-GArnoldi algorithm [16] by taking the adaptive GArnoldi method as an accelerated technique for the power method. More numerical methods for computing PageRank problems can be found in [29,30,31,32,33,34,35,36,37].
Since the Google matrix A in system (1.1) is stochastic, its largest eigenvalue is 1, and thus the linear system (1.1) can be rewritten as the following singular linear system:
where I∈Rn×n is an identity matrix.
In [26], the FOM algorithm is used to solve system (1.2). Hence, inspired by the GArnoldi algorithm [15,16,17], a generalized full orthogonalization method (GFOM) based on a weighted inner product is discussed for solving the singular linear system (1.2). To speed up the convergence performance, two accelerated techniques are applied to the GFOM algorithm respectively, one is the power method, and the other is the extrapolation method based on Ritz values, such that two new algorithms called GFOM-Power and GFOM-Extrapolation are proposed for computing PageRank. Their implementations and convergence analysis are studied in detail. Numerical results show that the performance of our proposed algorithms are superior to the other methods when the damping factor α is close to 1.
The remainder of the paper is organized as follows. In Section 2, we first briefly review the GArnoldi process, then discuss the GFOM algorithm for computing PageRank. In Section 3, we propose the GFOM-Power algorithm and the GFOM-Extrapolation algorithm for computing PageRank, and analyze their convergence properties, respectively. Numerical experiments and comparisons are reported in Section 4. Finally, conclusions are given in Section 5.
2.
GFOM algorithm for computing PageRank
In this section, we first briefly review the generalized Arnoldi (GArnoldi) process [15] based on weighted inner products, and then discuss the generalized FOM (GFOM) algorithm for computing PageRank. Some comparison results for the FOM algorithm and GFOM algorithm are presented at last.
2.1. The GArnoldi process
Given a symmetric positive definite (SPD) matrix G=(gij)∈Rn×n, let x=(xi)∈Rn×1, y=(yi)∈Rn×1 be two vectors, then a G-inner product is defined as
Since the matrix G is symmetric positive definite, it can be diagonalized. Suppose G=QTDQ, where Q∈Rn×n is an orthogonal matrix, and D=diag{d1,d2,…,dn} is a n×n diagonal matrix with di>0, i=1,2,…,n. Then the corresponding G-norm is defined as
Given a general non-Hermitian matrix ˜A∈Rn×n, an initial vector v0∈Rn×1, a SPD matrix G, and the steps m of the GArnoldi process, then the GArnoldi process is described as follows. More details can be found in [15,16,17].
Note that when the matrix G=I, the GArnoldi process is reduced to the standard Arnoldi process [21,38]. And according to Algorithm 1, the following relations hold:
where Vk=[v1,v2,…,vk]∈Rn×k (k=m, m+1) is a G-orthogonal matrix, Hm=(hi,j)∈Rm×m is an upper Hessenberg matrix and em∈Rm×1 is the mth co-ordinate vector.
2.2. The GFOM algorithm
In this subsection, by using the G-inner product as given above, we consider a generalized FOM (GFOM) algorithm for computing PageRank. The specific implementation of the GFOM algorithm is given as follows.
Here, some remarks need to be given about Algorithm 2.
● In the line 5, the GFOM algorithm seeks a solution x satisfying the Galerkin condition: −(I−A)x⊥Km(I−A,r0), then we have the following formulation:
where e1=[1,0,…,0]T. Thus if Hm is nonsingular, the vector y can be computed as y=βH−1me1.
● In the line 7, there are two differences between the GFOM algorithm and the FOM algorithm [26]. One is the 2-norm of the residual vector r, which can be computed as ‖r‖2=|hm+1,m|⋅|y(m:)|⋅‖vm+1‖2 rather than |hm+1,m|⋅|y(m,:)|. The other is the convergence condition, which is set as ‖r‖2/‖x‖1≤tol rather than ‖r‖2/‖x‖2≤tol.
● In the first iteration, the GFOM algorithm reduces to the FOM algorithm since G=I. For other iterations, from the line 10 of Algorithm 2, we construct the matrix G as
where ri is the ith component of the residual vector r.
2.3. Comparisons of the FOM algorithm and the GFOM algorithm
In this subsection, we aim at testing the effectiveness of the GFOM algorithm by comparing it with the FOM algorithm [26] for computing PageRank. The same initial guess x0=e/n is used, where e=[1,1,…,1]T. The damping factors are chosen as α = 0.99, 0.993, 0.995 and 0.997, respectively. And the stopping criteria are set as |hm+1,m|⋅|y(m,:)|/‖x‖1≤tol for the FOM algorithm and |hm+1,m|⋅|y(m:)|⋅‖vm+1‖2/‖x‖1≤tol for the GFOM algorithm, where tol=10−8 is the user described tolerance.
Here we show the choice of the restart number m by analyzing the numerical performance of the FOM algorithm and the GFOM algorithm for the wb-cs-stanford matrix, which is available from [39], and it obtains 9914 pages and 36, 854 links. Table 1 reports the number of matrix-vector products for the FOM algorithm and the GFOM algorithm when α=0.99, 0.993, 0.995, 0.997 and m=6, 7, 8, 9, 10, respectively. Figure 1 depicts the total CPU time of the FOM algorithm and the GFOM algorithm relative to different restart number m, respectively.
From Table 1, it observes that the number of matrix-vector products of the FOM algorithm and the GFOM algorithm is decreasing at first, and then is increasing as m increases in most cases. Specifically, it seems m=8 is a clear turning point on the whole. Meanwhile, from Figure 1, we find that the optimal CPU time of the FOM algorithm and the GFOM algorithm is different for different damping factor α. However, considering the memory requirements of the FOM algorithm and the GFOM algorithm, it is obvious that m=8 is a reasonable choice for most cases. Hence, we set m=8 in the following numerical experiments.
Figure 2 plots the convergence behavior of the FOM algorithm and the GFOM algorithm for the wb-cs-stanford matrix when α=0.99, 0.993, 0.995 and 0.997, respectively. It shows that the GFOM algorithm has a faster convergence than the FOM algorithm. Thus it is meaningful to consider the weighted inner products in the standard Arnoldi process.
3.
Acceleration of the GFOM algorithm for computing PageRank
Since the GFOM algorithm requires large memory as the restart number m increases, two accelerated techniques are used to improve the GFOM algorithm respectively, one is the power method and the other is the extrapolation method based on Ritz values. Thus two new algorithms named GFOM-Power and GFOM-Extrapolation are developed to compute PageRank problems, and their constructions and convergence analyses are discussed in detail.
3.1. The GFOM-Power algorithm for computing PageRank
In this subsection, we first give the description of the GFOM-Power algorithm, and then discuss its convergence.
3.1.1. The GFOM-Power algorithm
Similar to the construction of these algorithms in [16,22,24]. The idea of the GFOM-Power algorithm can be described as follows: given an initial vector x0, we first run Algorithm 2 for a few times (e.g., 2–3 times) to get an approximation, if the approximation is unsatisfactory, then we use the resulting vector as the initial guess of the power method to obtain another approximation. If this approximation is still unsatisfactory, repeat the above procedure periodically until the required tolerance is achieved.
One of the problems is when and how to control the conversion between the power method and the GFOM algorithm. Here we use the parameters ϕ, restart, maxit as the flip-flop [22]. Let τcurr and τprev be the residual norm of the current power iteration and the previous power iteration, respectively. Denoting ratio=τcurr/τprev, we estimate whether ratio is greater than ϕ, if so, let restart=restart+1. And then we check whether restart is larger than maxit, if so, terminate the power iteration and cycle the GFOM algorithm. Otherwise, continue to run the power method. Considering the asymptotic convergence rate of the power method for PageRank problems [5], it is reasonable to choose ϕ=α−0.1. In summary, the GFOM-Power algorithm that computes PageRank is given as follows.
Note that the residual r in the line 3.7 of Algorithm 3 is used not only to construct G (see the line 3.17), but also for checking convergence.
3.1.2. Convergence analysis of the GFOM-Power algorithm
In this subsection, we pay attention to analyze the convergence of the GFOM-Power algorithm. In particular, our analysis focuses on the procedure when turning from the power method to the GFOM algorithm.
Assume that eigenvalues of the Google matrix A are ordered as 1=|λ1|>|λ2|≥⋯≥|λn| and Λ(A) denotes the set of eigenvalues of A. Let Pm−1 be the set of polynomials of degree not greater than m−1 and (λi,μi), i=1,2,…,n denote the eigenpairs of A. For two given vectors v and w, cos∠(v,w)=⟨v,w⟩‖v‖2‖w‖2 indicates the cosin of the corresponding angle between them. We first introduce a useful theorem about the spectrum property of the Google matrix A.
Theorem 1 [40]. Let ˜P be an n×n column-stochastic matrix. Let α be a real number such that 0<α<1. Let E be an n×n rank-one column-stochastic matrix E=veT, where e is the n-vector whose elements are all ones and v is an n-vector whose elements are all nonnegative and sum to 1. Let A=α˜P+(1−α)E be an n×n column-stochastic matrix, then its dominant eigenvalue λ1=1, |λ2|≤α.
Now let's analyze the convergence of the GFOM-Power algorithm. Comparing the FOM algorithm and the GFOM algorithm, it is not difficult to find that the main difference between them lies in the Arnoldi process. The former uses the 2-norm, while the latter uses the G-norm. Suppose the weighted matrix G is given as in Eq (2.2), then the relation between the G-norm and the 2-norm is shown as follows.
Lemma 1 [16]. Let G=diag{δ1,δ2,⋯,δn}, δi>0,1≤i≤n, be a diagonal matrix. For any vector x∈Rn×1, according to the definitions of the G-norm and the 2-norm, it has
Since our analysis focuses on the procedure when turning from the power method to the GFOM algorithm, it is necessary to derive the iterative formula of the power method in Algorithm 3. Let ˜x be the initial vector for the power method, which is obtained from the GFOM algorithm. Then the power method of Algorithm 3 produces the vector xP=ωAl˜x, where l≥maxit, and ω=1/‖Al˜x‖1 is the normalizing factor. In the next cycle of the GFOM-Power algorithm, the resulting vector xP will be used as the initial vector for an m-step GFOM algorithm, so that the new associated Krylov subspace is
where rP=−(I−A)xP. For any vector u∈Km(I−A,rP), there exists a certain polynomial p(x) achieving minp∈Pm−1maxλ∈Λ(A)/λ1|p(λ)|, so that u=xP+p(I−A)rP. Then, we present the convergence result of the GFOM-Power algorithm for computing PageRank as follows.
Theorem 2. Let ˜x be the current approximate vector obtained from the GFOM algorithm, and assume that ˜x=∑ni=1γiμi with respect to the eigenbasis {μi}i=1,2,…,n, in which ‖μi‖2=1, i=1,2,…,n and γi≠0. Then, the relationship between the next approximate vector u and μ1 holds asymptotically that
where l≥maxit and ε1=minq∈Pm−1maxλ∈Λ(A)/λ1|q(λ)| with q(λ)=1−(1−λ)p(1−λ).
Proof. From the step 3 of Algorithm 3, we know that ˜x is used as an initial vector to run the power iterations, thus it has xP=ωAl˜x, which can be represented by
The corresponding residual is computed as rP=−(I−A)xP, which is used as the initial residual vector for the next GFOM algorithm. Thus the next approximate vector u∈Km(I−A,rP) is obtained, and there exists a polynomial p∈Pm−1 so that
where we used the facts that Aμi=λiμi (i=1,2,…,n).
By computing, we have
Using Eq (3.1), for the numerator of Eq (3.2), it has
On the other hand, for the denominator of Eq (3.2), it has
Substituting Eqs (3.3) and (3.4) into (3.2), and let q(λ)=1−(1−λ)p(1−λ), we have
where we used the result in Theorem 1, and |λn|≤⋯≤|λ2|≤α.
3.2. The GFOM-Extrapolation algorithm for computing PageRank
In this subsection, we describe the GFOM-Extrapolation algorithm and discuss its convergence.
3.2.1. The GFOM-Extrapolation algorithm
The extrapolation procedure is based on the power iterations and Ritz values. Assume that the initial approximation x0 can be written as the following linear combination:
where μ1, μ2, μ3 are the first three largest eigenvectors of the Google matrix A corresponding to λ1, λ2 and λ3, respectively. After two power iterations, it has
Then it has
By normalizing μ1, an improved approximation to the PageRank vector can be computed as
One advantage of the extrapolation procedure is that the unknown eigenvalues λ2 and λ3 of the Google matrix A can be estimated along with the step 4 in Algorithm 2. It is well-known that the Ritz values are defined as the eigenvalues of the upper Hessenberg matrix Hm and they can be used to approximate the true eigenvalues of the cofficient matrix I−A. Let ˜λ2 and ˜λ3 be the second and third largest eigenvalues of Hm, then we make a simple transformation,
Since ˜λ2 and ˜λ3 may be complex valued, the following classifications are made, please refer to [25,26] for details.
Case 1. If both ˜λ2 and ˜λ3 are real, or ˜λ2 and ˜λ3 are conjugate, then xEx is computed as Eq (3.6).
Case 2. If ˜λ2 is real but ˜λ3 is complex, we assume that x0=μ1+a2μ2 and x1=Ax0=μ1+a2λ2μ2, then xEx is computed as
Case 3. If ˜λ2 is complex but ˜λ3 is real, or both ˜λ2 and ˜λ3 are complex but not conjugate, then xEx is computed as
Now we accelerate the GFOM algorithm with the extrapolation procedure based on Ritz values for computing PageRank. The specific implementation is given as follows.
Note that the choice of the number maxnum in the line 5.1 of Algorithm 4 would be discussed in Section 4.
3.2.2. Convergence analysis of the GFOM-Extrapolation algorithm
In this subsection, we analyze the convergence of the GFOM-Extrapolation algorithm. In particular, our analysis focuses on the procedure when turning from the extrapolation procedure to the GFOM algorithm.
As shown in [41], the second largest eigenvalue λ2 of the Google matrix A is semi-simple. Here we assume that the third largest eigenvalue λ3 is also semi-simple and A is diagonalizable. The following theorem shows the form of the new PageRank vector obtained after the extrapolation procedure in Algorithm 4.
Theorem 3. Let ˜x be the initial vector for the extrapolation procedure, which is obtained from the previous GFOM algorithm. Assume that ˜x can be expressed as ˜x=μ1+∑ni=2aiμi with respect to the eigenbasis {μi}i=1,2,…,n, in which ‖μi‖2=1, i=1,2,…,n. Then the new initial vector which is obtained from the extrapolation procedure for the next GFOM algorithm is
where c=‖x2−(λ2+λ3)x1+λ2λ3x0‖2 is the scaling factor and f is the number for applying the extrapolation procedure maxnum.
Proof. The proof of Theorem 3 is totally similar to the convergence of the Arnoldi-Extrapolation algorithm and the FOM-Extrapolation algorithm, please refer to [25,26] for details.
Theorem 4. Under the above assumptions and set S=[λ1,λ2,λ3]. Then the relationship between the next approximate vector u and μ1 holds asymptotically that
where ξ=∑ni=4|ai||λ2i−(λ2+λ3)λi+λ2λ3||(1−λ2)(1−λ3)| and ε2=minq∈Pm−1maxλ∈Λ(A)/S|q(λ)| with q(λ)=1−(1−λ)p(1−λ).
Proof. According to Algorithm 4 and Theorem 3, after the extrapolation procedure, the residual vector of the approximate vector xEx is computed as rEx=−(I−A)xEx, which is used as the initial residual vector to run the next GFOM algorithm. Then we get the next approximation u∈Km(I−A,rEx), and there exists a polynomial p∈Pm−1 so that
where we used the facts that λ1=1, Aμ1=μ1 and Aμi=λiμi (i=4,…,n).
Then we have
Now, using Eq (3.1), for the numerator of Eq (3.10), it has
On the other hand, for the denominator of Eq (3.10), it has
Substituting Eqs (3.11) and (3.12) into (3.10), and let q(λ)=1−(1−λ)p(1−λ), we have
where we also used the result in Theorem 1, and |λn|≤⋯≤|λ2|≤α.
4.
Numerical experiments
In this section, we present numerical experiments to verify the effectiveness of the proposed algorithms: the GFOM-Power algorithm and the GFOM-Extrapolation algorithm in terms of the computing time (CPU) in seconds and the number of matrix-vector products (Mv). All the numerical results are obtained by using MATLAB 2019a on the Windows 10 64 bit operating system with 2.40 GHz Intel(R) Core(TM) i7-5500U CPU and RAM 8.00 GB.
The characteristic of test matrices is listed in Table 2, where n stands for the matrix size, nnz denotes the number of nonzero elements, numd is the number of dangling nodes and den is the density which is defined by den=nnzn×n×100. All test matrices are available from [39]. For the sake of justice, all algorithms use the same initial guess x0=e/n with e=[1,1,…,1]T and the same tolerance tol=10−8. And the damping factors are set as α = 0.99, 0.993, 0.995 and 0.997, respectively.
4.1. Numerical comparisons for the GFOM-Power algorithm
In this subsection, we test the effectiveness of the GFOM-Power algorithm (denoted as "GFOM-P"), and compare it with the the Power-Arnoldi algorithm (denoted as "PA") [22], the Arnoldi-Chebyshev algorithm (denoted as "AC") [32] as well as the GFOM algorithm (denoted as "GFOM") as given in Algorithm 2. The test matrices are soc-Epinions1, eu-2005, in-2004 and wikipedia-20051105. We define
to record the speedup of the GFOM-Power algorithm with respect to the GFOM algorithm in terms of CPU time. Note that, in the Power-Arnoldi algorithm, we run the thick restarted Arnoldi procedure two times per cycle with the number of approximate eigenpairs k=6. In the GFOM-Power algorithm, we run the GFOM procedure also two time per cycle. And in the Arnoldi-Chebyshev algorithm, we set the Chebyshev steps l=10.
4.1.1. The choice of the number maxit
In this subsection, we show the choice of the number maxit according to the numerical performance of the Power-Arnoldi algorithm and the GFOM-Power algorithm for the wb-cs-stanford matrix. Table 3 lists the number of matrix-vector products of the Power-Arnoldi algorithm and the GFOM-Power algorithm when the damping factor α=0.99, 0.993, 0.995, 0.997 and the number maxit varies from 6 to 9, respectively. Figure 3 depicts the total CPU time of the Power-Arnoldi algorithm and the GFOM-Power algorithm versus different number maxit.
From Table 3, we find that the number of matrix-vector products of the Power-Arnoldi algorithm and the GFOM-Power algorithm is first decreasing and then increasing as the number maxit increases when the damping factor α=0.99 and 0.993. As for α=0.995 and 0.997, it shows that the number of matrix-vector products of the Power-Arnoldi algorithm and the GFOM-Power algorithm is relatively small when the number maxit=8. At the same time, from Figure 3, we observe that the optimal number maxit to the minimum CPU time of the Power-Arnoldi algorithm and the GFOM-Power algorithm is different for different damping factor α. In the left picture of Figure 3, we find that the total CPU time of the Power-Arnoldi algorithm is less when maxit = 8. In the right picture of Figure 3, it seems that both the number maxit=8 and 9 are feasible. Therefore, as a compromise choice, maxit=8 is selected in the following numerical experiments.
4.1.2. Numerical comparisons
In this subsection, Table 4 lists numerical results of the GFOM algorithm, the Power-Arnoldi algorithm, the Arnoldi-Chebyshev algorithm and the GFOM-Power algorithm for these four test matrices when α=0.99, 0.993, 0.995 and 0.997, respectively.
From Table 4, we can observe that,
● The GFOM-Power algorithm makes great improvements on the GFOM algorithm in terms of the number of matrix-vector products and the CPU time for these four test matrices with different damping factor α. Particularly, the reduction proportion of the CPU time is up to 53.76% for the soc-Epinions1 matrix with α=0.993, which shows that using power method to accelerate the GFOM algorithm is meaningful.
● The GFOM-Power algorithm also outperforms the Power-Arnoldi algorithm and the Arnoldi-Chebyshev algorithm in terms of the number of matrix-vector products and the CPU time for these four test matrices in most cases. Hence, we can say that the GFOM-Power algorithm is more efficient than the Power-Arnoldi algorithm and the Arnoldi-Chebyshev algorithm, especially when the damping factor α is high.
Figure 4 plots the convergence behavior of the GFOM algorithm, the Power-Arnoldi algorithm, the Arnoldi-Chebyshev algorithm and the GFOM-Power algorithm for the soc-Epinions1 matrix when α = 0.99, 0.993, 0.995 and 0.997, respectively. It shows that the GFOM-Power algorithm converges faster than the other two algorithms. Similar convergence behaviors can be found for the other three test matrices, so we do not plot them for simplicity.
4.2. Numerical comparisons for the GFOM-Extrapolation algorithm
In this subsection, we test the effectiveness of the GFOM-Extrapolation algorithm (denoted as "GFOM-EXT"), and compare it with the FOM-Extrapolation algorithm (denoted as "FOM-EXT") [26], the Arnoldi-Chebyshev algorithm [32] and the GFOM algorithm (denoted as "GFOM") as given in Algorithm 2. The test matrices are California, soc-Epinions1, flickr and wiki-Talk. We define
to record the speedup of the GFOM-Extrapolation algorithm with respect to the GFOM algorithm in terms of CPU time. Note that we run the GFOM procedure two times per cycle in the GFOM-Extrapolation algorithm.
4.2.1. The choice of the number maxnum
In this subsection, we show the choice of the number maxnum by analyzing the numerical behavior of the FOM-Extrapolation algorithm and the GFOM-Extrapolation algorithm for the wb-cs-stanford matrix. Table 5 lists the number of matrix-vector products for the FOM-Extrapolation algorithm and the GFOM-Extrapolation algorithm when the damping factor α=0.99, 0.993, 0.995, 0.997 and the number maxnum=2, 3, 4, 5, respectively. Figure 5 depicts the total CPU time of the FOM-Extrapolation algorithm and the GFOM-Extrapolation algorithm versus different number maxnum.
From Table 5, it observes that the number of matrix-vector products of the FOM-Extrapolation algorithm and the GFOM-Extrapolation algorithm is first decreasing and then increasing as the number maxnum increases in most cases. Meanwhile, from Figure 5, we find that the CPU time of the FOM-Extrapolation algorithm and the GFOM-Extrapolation algorithm has a similar trend for different damping factor α in most cases, i.e., the CPU time is relatively less when maxnum=4. Hence, we choose maxnum=4 in the following numerical experiments.
4.2.2. Numerical comparisons
In this subsection, Table 6 lists numerical results of the GFOM algorithm, the Arnoldi-Chebyshev algorithm, the FOM-Extrapolation algorithm and the GFOM-Extrapolation algorithm for these four test matrices when α=0.99, 0.993, 0.995 and 0.997, respectively.
From Table 6, we find that,
● The GFOM-Extrapolation algorithm is superior to the GFOM algorithm in terms of the number of matrix-vector products and the CPU time for these four test matrices with different damping factor α. Particularly, the speedup is up to 56.90% for the wiki-Talk matrix with α=0.997. Hence, we can say that it is meaningful to use the extrapolation procedure based on Ritz values to improve the GFOM algorithm.
● The GFOM-Extrapolation algorithm needs more CPU time than the Arnoldi-Chebyshev algorithm in some cases. However, in terms of the number of matrix-vector products, the GFOM-Extrapolation algorithm is superior to the Arnoldi-Chebyshev algorithm for all the test matrices.
● The GFOM-Extrapolation algorithm works better than the FOM-Extrapolation algorithm in terms of the number of matrix-vector products and the CPU time for these test matrices. Therefore, this suggests that the GFOM-Extrapolation algorithm has a faster convergence than the FOM-Extrapolation algorithm, which again illustrate that using weighted inner products in the standard Arnoldi process is significant.
Figure 6 plots the convergence behavior of the GFOM algorithm, the Arnoldi-Chebyshev algorithm, the FOM-Extrapolation algorithm and the GFOM-Extrapolation algorithm for the wiki-Talk matrix when α = 0.99, 0.993, 0.995 and 0.997, respectively. It shows that the GFOM-Extrapolation algorithm converges faster than its counterparts. Note that the other three test matrices have the similar convergence behaviors.
In a word, all the above numerical experiments have shown that the GFOM-Power algorithm and the GFOM-Extrapolation algorithm are superior to their counterparts for computing PageRank when the damping factor α is close to 1.
5.
Conclusions
In this paper, a generalized FOM (GFOM) algorithm based on weighted inner products is discussed for computing PageRank as given in Algorithm 2. With the aim at improving the convergence performance, we develop the GFOM-Power algorithm and the GFOM-Extrapolation algorithm by using the power method and the extrapolation method based on Ritz values as the accelerated techniques respectively. Their specific implementations and convergence analyses can be found in Section 3. Numerical experiments in Section 4 are presented to illustrate the efficiency of our proposed algorithms. Furthermore, the proposed algorithms are worth trying to solve other Markov problems, such as ProteinRank and CiteRank.
Acknowledgments
The authors are grateful to the anonymous referees for their much constructive comments and valuable suggestions, which greatly improved the original manuscript. This research is supported by the National Natural Science Foundation of China under grant 12101433, and the Two-Way Support Programs of Sichuan Agricultural University (Grant No.1921993077).
Conflict of interest
The authors declare there is no conflicts of interest.