GPBi‐CGstab(L): A Lanczos‐type product method unifying Bi‐CGstab(L) and GPBi‐CG

Lanczos‐type product methods (LTPMs), in which the residuals are defined by the product of stabilizing polynomials and the Bi‐CG residuals, are effective iterative solvers for large sparse nonsymmetric linear systems. Bi‐CGstab(L) and GPBi‐CG are popular LTPMs and can be viewed as two different generalizations of other typical methods, such as CGS, Bi‐CGSTAB, and Bi‐CGStab2. Bi‐CGstab(L) uses stabilizing polynomials of degree L, while GPBi‐CG uses polynomials given by a three‐term recurrence (or equivalently, a coupled two‐term recurrence) modeled after the Lanczos residual polynomials. Therefore, Bi‐CGstab(L) and GPBi‐CG have different aspects of generalization as a framework of LTPMs. In the present paper, we propose novel stabilizing polynomials, which combine the above two types of polynomials. The resulting method is referred to as GPBi‐CGstab(L). Numerical experiments demonstrate that our presented method is more effective than conventional LTPMs.


INTRODUCTION
The bi-conjugate gradient method (Bi-CG) 1 is one of the most basic iterative solvers for large sparse nonsymmetric linear systems with the form Ax = b, where A ∈ R n×n is nonsingular and b ∈ R n . In the present study, we treat Lanczos-type product methods (LTPMs), 2 which are characterized by generating the residuals defined by the form is the kth Bi-CG residual and H k ( ) is a so-called stabilizing polynomial of degree k. These types of methods are also called hybrid Bi-CG methods. By selecting appropriate polynomials H k ( ), LTPMs can have better convergence than the underlying Bi-CG for various problems. For this reason, several LTPMs have been developed over the years.
Studies on the stabilizing polynomials stem from the CG squared method (CGS), 3 which has been developed by Sonneveld as an efficient variant of Bi-CG. The CGS residuals are defined by r cgs k ∶= R 2 k (A)r 0 , where r 0 ∶= b − Ax 0 is an initial residual and R k ( ) is a Lanczos residual polynomial, that is, the residual polynomial of Bi-CG, and r bicg k The Bi-CG stabilized method (Bi-CGSTAB) 4 proposed by van der Vorst uses the stabilizing polynomials defined by where the coefficient k ∈ R is determined, for example, by locally minimizing the residual at each iteration. Presently, Bi-CGSTAB is one of the most popular methods in many fields. However, real stabilizing polynomials of degree 1 do not work well if A ∈ R n×n has complex eigenvalues such that the imaginary part is relatively large compared to the real part.
To overcome this difficulty, Gutknecht 5 incorporated second-degree polynomials into the iteration process of Bi-CGSTAB. The resulting method is known as Bi-CGStab2. Subsequently, Sleijpen and Fokkema 6 developed the more general method Bi-CGstab(L), which uses stabilizing polynomials of degree L; that is, where the integer k is a multiple of the parameter L. The coefficients k,j ∈ R for j = 1, 2, … , L are usually selected such that the updated residual is locally minimized. The formulation (2) is a natural extension of (1); that is, Bi-CGstab(L) simplifies to Bi-CGSTAB when L = 1. Bi-CGStab2 is also covered by Bi-CGstab(L) in theory; more precisely, when L = 2, Bi-CGstab(2) generates the same residuals as obtained for even iterations of Bi-CGStab2 (if there is no breakdown). However, they exhibit significantly different convergence properties in actual computations. Bi-CGstab(L) with L > 1 (e.g., L = 2 or 4) is actually often more effective than Bi-CGSTAB and Bi-CGStab2, especially for highly nonsymmetric matrices with complex eigenvalues close to the imaginary axis.
As another extension of CGS, Fokkema et al. 7 developed generalized CGS methods (GCGS), which use a product of two nearby Bi-CG polynomials (instead of the squares of one Bi-CG polynomial). In these methods the stabilizing polynomials are given by coupled two-term recurrences, as in the Lanczos residual polynomials. Thus, GCGS has two recurrence coefficients as parameters, and the conventional CGS and Bi-CGSTAB can be viewed as special cases of GCGS. On the other hand, based on a recurrence property of the Lanczos residual polynomials, Zhang 8 introduced more general and essential polynomials: where k = 1, 2, …, and k , k ∈ R are independent parameters. This formulation gives a collection of the product-type methods based on Bi-CG, and some of the conventional LTPMs can be derived. By choosing the two parameters to be identical to those in the Lanczos residual polynomials with the Bi-CG coefficients, H k ( ) corresponds to R k ( ), and it leads to CGS. When k = 0, (3) simplifies to (1), and we can recover Bi-CGSTAB by selecting k to minimize the residuals. Bi-CGStab2 can be obtained by minimizing the residual with respect to k and k at even iterations, and to k at odd iterations with k = 0. Zhang suggested determining k and k to minimize the residual at each iteration. The resulting method, named GPBi-CG, is quite competitive in LTPMs for systems with a complex spectrum. Note that, in an actual derivation of the GPBi-CG algorithm, coupled two-term recurrences are utilized (instead of (3)) to construct H k ( ). For details, we refer to section 4 in Zhang. 8 It is important to develop a comprehensive algorithm in LTPMs, because this may enable us to classify the methods in a simple manner, and also the resulting new method may have higher performance by combining the strengths of the existing methods. Bi-CGstab(L) and GPBi-CG are two valuable algorithms both as specific effective solvers and as frameworks of LTPMs. However, because there is no inclusion relationship between the two methods, it is important to consider a more general framework that unifies Bi-CGstab(L) and GPBi-CG. Therefore, in the present paper, we propose such a novel method, named GPBi-CGstab(L). The stabilizing polynomials of the new method consist of Lth degree polynomials and relaxation terms defined by the differences of the two previous polynomials; a specific definition is given in Section 3. There are L + 1 parameters that determine the stabilizing polynomials. The resulting method simplifies to Bi-CGstab(L) if one parameter in the relaxation term is set to 0, and it is mathematically equivalent to GPBi-CG in the case of L = 1. Therefore, the conventional LTPMs, such as CGS, Bi-CGSTAB, and Bi-CGStab2, are also included in this framework. As a specific algorithm, it would be natural and effective to select appropriate L + 1 parameters to locally minimize the residuals. Numerical experiments demonstrate that the proposed GPBi-CGstab(L) can obtain better convergence than the conventional LTPMs when solving nonsymmetric linear systems, especially with a complex spectrum.
The remainder of this paper is organized as follows. In the next section, we give derivations of the Bi-CGstab(L) and GPBi-CG algorithms, which are useful for designing the new method. In Section 3, we introduce comprehensive stabilizing polynomials and derive an algorithm for GPBi-CGstab(L) unifying Bi-CGstab(L) and GPBi-CG. In Section 4, we demonstrate the effectiveness of GPBi-CGstab(L) through numerical experiments on model problems. The concluding remarks and future prospects are presented in Section 5.

BI-CGSTAB(L) AND GPBI-CG
In this section, we describe the Bi-CG recurrences used in LTPMs and derive Bi-CGstab(L) and GPBi-CG algorithms individually. These derivations differ slightly from the original ones but are useful to design the novel method.

Bi-CG recurrences in LTPMs
We first recall the recurrence relations in the standard Bi-CG. Let r bicg k and p bicg k be the residual and direction vectors of Bi-CG, respectively. It is well-known that these vectors satisfy the following coupled two-term recurrences: Here, the initial direction is defined as p bicg 0 ∶= r bicg 0 and the Bi-CG coefficients k and k are determined by the bi-orthogonality conditions r and an initial shadow residualr 0 .
In the following, H k (A) is denoted as H k for simplicity. By multiplying (4) from the left by H k , we have the following: The residual and direction vectors of LTPMs are expressed as r k = H k r bicg k and p k = H k p bicg k , respectively. Therefore, by introducing the intermediate auxiliary vectors r (1) k ∶= H k r bicg k+1 and p (1) k ∶= H k p bicg k+1 into (5), we obtain the update formulas (called a Bi-CG step) as are often used in LTPMs: where ⋆ denotes an explicit matrix-vector multiplication. The associated approximations can be expressed by x (1) From the biorthogonality conditions, k and k can be expressed by where k ∶= (r 0 , AH k p bicg k ) = (r 0 , Ap k ). Note that there exist different formulations of the Bi-CG coefficients. The formulation (7) is used in, for example, Abe and Sleijpen. 9,10 The vector Ar (1) k is computed in the iteration process. If Ar (1) k is obtained by a matrix-vector multiplication (i.e., Ar (1) k = A ⋆ r (1) k ), the vector Ap (1) k , usually required for updating iteration vectors, can be obtained by a vector update (i.e., Ap (1) k = A ⋆ r (1) k − k Ap k ). These computations are often performed together with (6) as parts of the Bi-CG step. For the numerical stabilities of these formulations, we refer to Abe and Sleijpen. 9,10 The above Bi-CG recurrences play an important role in the derivation of LTPM computational schemes in the present study.

Variant of Bi-CGstab(L)
Here, we derive a variant of the Bi-CGstab(L) algorithm, which uses the stabilizing polynomials H k ( ) given by (2). The residual and direction vectors of Bi-CGstab(L) are updated from r k = H k r bicg k and p k = H k p bicg k to r k+L = H k+L r bicg k+L and p k+L = H k+L p bicg k+L , respectively, where k is a multiple of L. This updating process is counted as one cycle of Bi-CGstab(L), and it consists of repeating the Bi-CG step L times and performing Lth degree minimization. 6,11 We assume that an approximation x k , the corresponding residual r k = H k r bicg k , and the direction vector p k = H k p bicg k are given at the beginning of the cycle. At the jth repetition of the Bi-CG steps (j = 1, 2, … , L), we can use the following recurrence relations derived from (5).
Like in (6)  and p (0) k ∶= p k = H k p bicg k , the updating process of the jth repetition can be expressed as follows.
Compute A j r Corresponding to (10) with i = 0, the associated approximations can be expressed by with x (0) k ∶= x k . Here, using the iteration vectors, the Bi-CG coefficients k can be computed as follows. where the beginning of the jth repetition, the new vectors x k for i = 0, 1, … , j are generated by (9)-(15). Following the depictions in Sleijpen and van Gijzen, 11 the computational schemes of the first and second repetitions of the Bi-CG steps are displayed as follows.
• Computational scheme of the first repetition (j = 1) of the Bi-CG steps: • Computational scheme of the second repetition (j = 2) of the Bi-CG steps: (2), the next residual r k+L can be computed by where k,j for j = 1, 2, … , L are usually determined by minimizing ||r k+L || 2 (cf. section 4.2.1 in Sleijpen et al. 12 ). The associated approximation x k+L and the next direction vector p k+L = H k+L p bicg k+L can also be updated as follows.
Algorithm 1 presents a variant of Bi-CGstab(L). As used in Sleijpen and van Gijzen, 11 10: 13: Remark 1. There are several implementations (but mathematically equivalent methods) for Bi-CGstab(L). 11,12 Our presented scheme is based on algorithm 3.1 in Sleijpen and van Gijzen, 11 but with a minor change. Algorithm 3.1 11 generates Ap k with a vector update, whereas our variant computes it by explicitly multiplying p k by A. The former approach is useful to express Bi-CGstab(L) in a similar fashion to the induced dimension reduction (IDR)-type methods. 11,13 However, for a larger L, it can be one of the factors that induces a large residual gap (the difference between the recursively updated residual r k and the explicitly computed residual b − Ax k ) in finite precision arithmetic. 14 In such a case, the explicit multiplications with A in the latter approach help to reduce the residual gap. Therefore, we employ this formulation in the present study. For further discussions on the residual gap and its improvement, we refer to, for example, Aihara et al., 14,15 Sleijpen and van der Vorst, 16 and van der Vorst and Ye. 17

Variant of GPBi-CG
Next, we derive a variant of the GPBi-CG algorithm. To this end, we slightly rewrite the stabilizing polynomials (3) as in the following formulation.
where k = 1, 2, …, and the auxiliary polynomials G k−1 ( ) are defined as We now consider the updating process of iteration vectors. From (17), we have Thus, the residual and direction vectors of GPBi-CG can be expressed by where r (1) k ∶= H k r bicg k+1 and p (1) k ∶= H k p bicg k+1 as in (6), and y (1) k ∶= AG k−1 r bicg k+1 and u (1) k ∶= AG k−1 p bicg k+1 are also introduced. If these auxiliary vectors are available, the parameters k and k can be computed such that ||r k+1 || 2 is minimized (cf. section 5.3 in Zhang 8 ). By introducing z (1) k ∶= G k−1 r bicg k+1 , the associated approximation can be expressed by at the beginning of the (k + 1)th iteration. Then, we can obtain x (1) k , A i r (1) k , and A i p (1) k for i = 0, 1, by performing the Bi-CG step described in (16). Therefore, we consider how to generate the additional vectors y (1) k , u (1) k , and z (1) k at the (k + 1)th iteration. From (4) and (18), we have the following relations.
By introducing the auxiliary vectors (19), we obtain the following formulas to compute the vectors y (1) k and u (1) k .
Here, note that r (1) k−1 , p (1) k−1 , and q k (= Ap (1) k−1 ) have already been generated in the previous iteration, and k , k , and Ap k are given in the Bi-CG step. Because G k = k H k + k G k−1 holds from the definitions of the polynomials, we also have and we can compute z (1) k and z k ∶= G k−1 r bicg k recursively as follows.
Thus, all of the auxiliary vectors required for performing the iterations are obtained. Algorithm 2 presents a variant of GPBi-CG. The first iteration of GPBi-CG actually corresponds to that of Bi-CGSTAB. Therefore, for ease of comprehension, we display one Bi-CGSTAB iteration separately from the iterations for GPBi-CG. For the notation, we refer to Algorithm 1. Note that the vectors r (1) k respectively, to be carried over to the next iteration.
= (r ⊤ 0 r 1 )∕ , p = r − p 8: Set r ′ = r 0 , p ′ = p 0 , and q = p 1 9: Compute to minimize ‖r 0 − r 1 ‖ 2 10: z = r 0 , x = x + z 11: r = r 0 − r 1 , p = p 0 − p 1 % Iterations for GPBi-CG 12: while ‖r 0 ‖ 2 > tol do 13: Set r ′ = r 0 , p ′ = p 0 , and q = p 1 20: Compute and to minimize ‖r 0 − r 1 − y‖ 2 21: Remark 2. Some related methods, which are mathematically equivalent to GPBi-CG but use different implementations, have been developed. 2,10,18,19 Algorithm 2 is also a mathematically equivalent implementation to them but uses partly different formulations from either. We note that our variant is derived from a novel perspective. We see from (17) that the stabilizing polynomial H k+1 ( ) consists of two parts: one is the multiplication of one-degree polynomial (1 − k ) by H k ( ) and the other is a relaxation term − k G k−1 ( ) = k (H k ( ) − H k−1 ( )). This interpretation is useful to incorporate higher order polynomials into GPBi-CG and to make the new method described in the next section.

GPBI-CGSTAB(L)
In this section, we propose GPBi-CGstab(L), which unifies Bi-CGstab(L) and GPBi-CG described above. The basic idea in deriving the new method is to extend the polynomials (2) and (17) to the following.
where k is a multiple of L and the auxiliary polynomials G k−1 ( ) are defined as The coefficients k,j ∈ R for j = 1, 2, … , L and k ∈ R are independent parameters. Obviously, (21) simplifies to (2) when k = 0 (i.e., the relaxation term − k G k−1 ( ) vanishes), and (21) is equivalent to (17) (and (3)) in the case of L = 1. GPBi-CGstab(L) is a novel LTPM that uses the stabilizing polynomials defined by (21) with the L + 1 parameters

Recursion formulas for iteration vectors
and the updated residual and direction vectors are expressed by where The associated approximation can be expressed as where are given at the beginning of the cycle, x , L can be generated by repeating the Bi-CG step as in Bi-CGstab(L) (see (9)-(15)). Therefore, here we consider computations for the additional vectors y (L) k , u (L) k , and z (L) k . As an extension of (19), we can use the following recurrence relations.
end for (25) By introducing the intermediate auxiliary vectors y (25), and rewriting the coefficients as k ∶= k+j , we obtain the following recursions to compute y (L) k and u (L) k .
Here, except for q We therefore construct additional recursions to compute q (j−1) k for j > 1. Based on (8), we employ the following relations.
for j = 2, 3, … , L do end for Here, the index i is set to i = 1, 2, … , L − j + 1 for each j > 1. Note that, in addition to q (0) were also obtained in the previous cycle. The computational schemes of (27) and (28) at the jth repetition (j = 2, 3, 4) with L = 4 are summarized as follows.
• Computational scheme to obtain q (1) k at the second repetition (j = 2) with L = 4: • Computational scheme to obtain q (2) k at the third repetition (j = 3) with L = 4: As (2) k • Computational scheme to obtain q (3) k at the fourth repetition (j = 4) with L = 4: Aq (2) k s (2) k Note that in contrast to the repetitions for (9)-(13), the number of vector updates decreases with an increase in j.
Next, we consider the recursion formulas to generate z (L) k . From (21) and (22), it holds that end for Here, u (j−1) k for j = 1, 2, … , L are given in (26). We now have all the recursion formulas for updating the iteration vectors.

Choice of parameters and a specific algorithm
We describe a practical choice for the parameters k,1 , … , k,L , k . As in Bi-CGstab(L) and GPBi-CG, it is natural to determine the L + 1 parameters to minimize ||r k+L || 2 at each cycle; that is, we solve for k ∶= [ k,1 , … , k,L ] ⊤ ∈ R L and k ∈ R. It is well-known that the least squares problem (31) can be converted into the normal equation For a modest value of L, (32) is easy to solve with a direct method, such as the Cholesky factorization method. As noted in Sleijpen et al., 12 for stability reasons, it would also be useful to solve (31) directly with the QR factorization method if B k is ill-conditioned. Algorithm 3 presents a GPBi-CGstab(L) algorithm. Because the first cycle of GPBi-CGstab(L) corresponds to that of Bi-CGstab(L), it is displayed separately, as in Algorithm 2. To perform the cycles of GPBi-CGstab(L), the vector updates 26-29 are combined with the Bi-CG steps of Bi-CGstab(L) (Algorithm 1). Then, after the determination of the parameters by (31), the updates (23), (24), and (30) are performed. Note that the vectors r ( k−L ) are saved as r ′ , p ′ , s i−1 , and q i , respectively, to be carried over to the next cycle. =r ⊤ 0 r 0 5: for j = 1, 2, … , L do 6: =r ⊤ 0 r j , = ∕ , p = r − p 10: end for 11 It is evident that GPBi-CGstab(L) (Algorithm 3) simplifies to Bi-CGstab(L) (Algorithm 1) if k = 0, and to GPBi-CG (Algorithm 2) if L = 1. Consequently, CGS, Bi-CGSTAB, and Bi-CGStab2, can also be viewed as special cases of the presented framework; for these derivations, we refer to Zhang. 8 Our variants of Bi-CGstab(L) (Algorithm 1) and GPBi-CG (Algorithm 2) are mathematically equivalent methods to the original Bi-CGstab(L) 6 and GPBi-CG, 8 respectively, but the variants use different implementations and require slightly more computational costs as shown in Section 3.4. However, the essence of the presented variants is to enable us to unify the two conventional frameworks (i.e., Bi-CGstab(L) and GPBi-CG), and to derive the mathematically novel method GPBi-CGstab(L). Because our main purpose is to provide this new framework of LTPMs, we will not discuss enhanced (but mathematically equivalent) implementations of Bi-CGstab(L) and GPBi-CG themselves; for refined variants of them, see, for example, Sleijpen et al. 12 and Abe and Sleijpen. 10

Advantage of the comprehensive stabilizing polynomials
We discuss advantages of the stabilizing polynomials of GPBi-CGstab(L) compared to those of the underlying Bi-CGstab(L) and GPBi-CG. Similarly, we distinguish the stabilizing polynomials of the methods by using the superscripts "-stab(L)," "gp-," and "gp-stab(L)." We first confirm the relations between the above residuals and the Krylov subspace. We recall from the definitions (21) and (22)  Then, we consider effects of the relaxation term that is a common part with GPBi-CG (but not in Bi-CGstab(L)). In (33), if the Lth degree polynomial (I − k,1 A − · · · − k,L A L ) is a significant factor to minimize the residual, k is necessarily selected to be small and the influence of the relaxation term would be almost eliminated. Otherwise, the L + 1 parameters including k are appropriately selected to locally minimize the residual; the updated GPBi-CGstab(L) residual can be smaller compared to the case of choosing k = 0 like Bi-CGstab(L). In this connection, we can compare the residual norms obtained in the second cycle of GPBi-CGstab(L) and Bi-CGstab(L). The first cycle of GPBi-CGstab(L) is the same as that As a result, GPBi-CGstab(L) has both advantages of Bi-CGstab(L) and GPBi-CG, and this may lead to higher performance than the conventional ones. Note that, we here focused on the differences of the stabilizing polynomials when updating the residuals, because it seems to be difficult to compare r -stab(L) k , r gpk , and r gp-stab(L) k , directly. We will analyze the theoretical relations between the residuals in the future. In Section 4, we demonstrate by numerical experiments that GPBi-CGstab(L) with L > 1 actually achieves better convergence than Bi-CGstab(L) and GPBi-CG. Remark 3. We note that, although GPBi-CGstab(L) performs residual minimization with L + 1 parameters, it should not be confused with Bi-CGstab(L + 1). Because the residuals of GPBi-CGstab(L) and Bi-CGstab(L + 1) are generated in different Krylov subspaces at each cycle, they are viewed as essentially different methods. However, similar to the relations between GPBi-CG, Bi-CGStab2, and Bi-CGstab (2), the residuals of Bi-CGstab(L + 1) can be generated from a variation of GPBi-CGstab(L) in theory. If the parameters k,j for j = 2, 3, … , L and k are set to 0 at odd cycles, the stabilizing polynomials (21) are converted as follows.
where k = 0, (L + 1), 2(L + 1), … and G k ( ) ∶= (H k ( ) − H k+1 ( ))∕ . Then, the resulting algorithm generates the Bi-CGstab(L + 1) residuals at even cycles when the parameters are selected to locally minimize the residuals. However, this is just a theoretical aspect and it might be unsuitable as an implementation of Bi-CGstab(L + 1) for systems with a complex spectrum, because the resulting algorithm may have a numerical instability caused by the polynomials of degree one at odd cycles as in Bi-CGStab2 * . Table 1 summarizes the computational costs per MV and memory requirements for Bi-CGstab(L), GPBi-CG, and GPBi-CGstab(L). Throughout this paper, "MV" stands for a matrix-vector multiplication with the coefficient matrix A. Note that, Bi-CGstab(L) and GPBi-CGstab(L) require 2L MVs per cycle, while GPBi-CG requires 2 MVs per iteration, but, on average, all the methods require 1 MV per Krylov dimension. For Bi-CGstab(L) and GPBi-CG, we show the costs for the original implementations in Sleijpen and Fokkema 6 and Zhang 8 and for our variants (Algorithms 1 and 2). In the table, AXPY and DOT stand for a vector update of the form ax + y and an inner product x ⊤ y, respectively, with a ∈ R and x, y ∈ R n . Following the related studies, 10,13 the form ax or x + y is counted as 0.5 AXPYs in order to compare the theoretical number of arithmetic operations. The costs for checking the stopping rule are not included. Table 1 indicates that, for a modest value of L (e.g., L ≤ 4), the computational costs for AXPYs and DOTs of GPBi-CGstab(L) are not very high compared to those of Bi-CGstab(L) and GPBi-CG. As mentioned above, there are several formulations to compute the Bi-CG coefficients, and they have different numbers of DOTs. For instance, the numerator of k can be converted by (r 0 , r k ) = − k−L,L (r 0 , A L r (L) k−L ) by using the bi-orthogonality conditions, which reduces 0.5∕L DOTs in our variants.

Computational costs and memory requirements
The last column of Table 1 displays the amount of n-vectors that have to be stored in the algorithms. Here, the memory for the coefficient matrix (O(n) storage in many cases of sparse matrices) and for the right-hand side vector are not counted. In Algorithms 2 and 3, the vector v can be stored at the same storage of r ′ (or p ′ ), and we count the amount based on such an implementation. In theory, the number of n-vectors can be reduced by one by overwriting q 0 with the same storage of v, but this requires modifications of the algorithms and leads to a slightly complicated implementation. *The residuals obtained at even iterations of Bi-CGStab2 is the same as those generated by Bi-CGstab(2) in exact arithmetic. From this perspective, Bi-CGStab2 and Bi-CGstab(2) can be viewed as mathematically equivalent methods. However, their numerical behavior is significantly different for systems with a complex spectrum, because Bi-CGStab2 uses stabilizing polynomials of degree one at odd iterations as used in Bi-CGSTAB. Actually, Bi-CGstab (2) is numerically more stable than Bi-CGStab2 in our experiments; see Section 4.1.

NUMERICAL EXPERIMENTS
In this section, we present numerical experiments to demonstrate the effectiveness of the proposed GPBi-CGstab(L) algorithm. Through some model problems, we compare the convergences of CGS, Bi-CGSTAB, Bi-CGStab2, Bi-CGstab(L), GPBi-CG, and GPBi-CGstab(L). Numerical calculations were conducted by double-precision floating-point arithmetic on a PC (Intel Xeon E3-1270 v5 CPU with 32 GB of RAM) equipped with MATLAB R2018a. Bi-CGstab(L) and GPBi-CGstab(L) were implemented based on Algorithms 1 and 3, respectively, and the others were based on algorithms given by Zhang. 8 The iterations were started with 0 and were stopped when the relative residual 2-norm ||r k || 2 ∕||b|| 2 became less than 10 −12 for all the methods. The right-hand side vector b was given as b = Ax * with the exact solution x * ∶= (1, 1, … , 1) ⊤ for given matrices. The initial shadow residualr 0 was set to r 0 (= b). Different conditions were set for each example below.
Note that Bi-CGstab(L) and GPBi-CGstab(L) can be implemented with different ways which may produce different numerical results. In particular, there are several approaches to perform residual minimization. We have used the backslash command in MATLAB for the normal equation based on (32). The other parts of the methods have been implemented naively based on Algorithms 1 and 3. We will not further discuss the details of implementations in the present paper, but will seek more effective implementations in the future.

Numerical example 1
We first show that the proposed method can be a promising alternative to the conventional LTPMs when solving systems with a complex spectrum. To this end, we employ the following two Toeplitz matrices: Toeplitz 1 is motivated by the test matrix in Gutknecht, 5 and the eigenvalues of this matrix are known to be distributed in a radial fashion on the complex plane. 20 The specific spectrum is given in Figure 1. Toeplitz 2 is well-known as Grcar's matrix, and Figure 2 displays its eigenvalues computed by the MATLAB function eig. For these types of problems, higher-order stabilizing polynomials are expected to be more effective. Figures 3 and 4 display the convergence histories of the conventional and proposed LTPMs for Toeplitz 1 and 2, respectively. The plots show the number of MVs on the horizontal axis versus the log 10 of the relative residual 2-norm (||r k || 2 ∕||b|| 2 ) on the vertical axis. Here, the parameter L was set to 2 in Bi-CGstab(L) and GPBi-CGstab(L).
From Figure 3, we can make the following observations for Toeplitz 1. The residual norms of CGS and Bi-CGSTAB diverge and stagnate, respectively, and their iterations cannot proceed owing to breakdown. Bi-CGStab2, which performs second-degree minimizations, converges, although it exhibits large oscillations in the residual norms. Bi-CGstab(2) and GPBi-CG converge slightly faster than Bi-CGStab2; their speed of convergence are roughly the same. Subsequently, the residual norms of the proposed GPBi-CGstab(2) decrease significantly faster than those of the other methods and is the only one to converge within 2n MVs (i.e., the theoretical upper bound of MVs required for successful convergence).
For Toeplitz 2, all the methods do not converge within 2n MVs, but Bi-CGstab(2) and GPBi-CGstab(2) converge as displayed in Figure 4. Actually, CGS and Bi-CGSTAB have breakdown, and both Bi-CGStab2 and GPBi-CG converge at around 3000 MVs. Thus, together with the above results, Bi-CGstab(2) seems to be numerically more stable than Bi-CGStab2 although they produce the same residuals at even iterations in exact arithmetic, and is also more effective than GPBi-CG for this problem. Nevertheless, GPBi-CGstab(2) (the new method unifying Bi-CGstab(2) and GPBi-CG) achieves much faster convergence than Bi-CGstab(2).  We now proceed to compare the convergences of Bi-CGstab(L) and GPBi-CGstab(L) for L ≥ 2. Table 2 shows the number of MVs required for the successful convergence of the two methods with L = 2, 3, … , 10 for the Toeplitz matrices. The last two columns display the average (Ave.) and the SD of the number of MVs.
From Table 2, we can observe the following. Comparing to the case with L = 2, the speed of convergence of Bi-CGstab(L) and GPBi-CGstab(L) can be enhanced by increasing L, but it seems to be limited for modest values of L. GPBi-CGstab(L) has smaller SD than Bi-CGstab(L), while the two methods exhibit similar convergence for L ≥ 4. For small values of L (i.e., L = 2, 3), GPBi-CGstab(L) has clearly faster convergence than Bi-CGstab(L) and achieves closer to the optimal number of MVs. This is an advantage of the proposed method, because the computational costs per MV increase with increasing L and it is more efficient to use as small L as possible if we obtain similar convergence behavior.
Remark 4. We briefly remark on the validity of the considerations in Section 3.3. Figure 5 displays the relative residual 2-norms of Bi-CGstab(2), GPBi-CG, GPBi-CGstab(2), and GPBi-CGstab(4) during the early iterations for Toeplitz 1. The plots show the number "k" on the horizontal axis versus the log 10 of the relative residual 2-norm (||r k || 2 ∕||b|| 2 ) on the vertical axis. Bi-CGstab(2), GPBi-CG, GPBi-CGstab(2) generate the same residuals for k = 2. (Note that the second residual of GPBi-CG is the same as that of Bi-CGStab2 in theory.) The residuals obtained by GPBi-CGstab (2) are smaller than those obtained by GPBi-CG for k = 4, 6, …, and GPBi-CGstab(4) generates even smaller residuals for k = 4, 8, …. Thus, the L + 1-dimensional minimization in GPBi-CGstab(L) seems to be more effective than performing residual minimization over two-dimensional space L times as in GPBi-CG. To compare Bi-CGstab(2) and GPBi-CGstab (2) in detail, we also show the coefficients of the stabilizing polynomials and the residual norms of the two methods in Table 3. As described in Section 3.3, we have ||r gp-stab(L) 4 || 2 ≤ ||r -stab(L) 4 || 2 at the second cycle. In this case, however, is a good factor to reduce the residual and 2 is selected to be small (i.e., we have r gp-stab(L) 4 ≈ r -stab(L) 4 ). Then, at the third cycle, the parameters 4,1 , 4,2 , and 4 are appropriately selected to minimize the residual, and the updated residual of GPBi-CGstab(2) is visibly smaller than that of Bi-CGstab(2).

Numerical example 2
Next, we consider various types of nonsymmetric matrices, that have different size, sparsity, condition number, and application discipline, to show the reliability and robustness of the proposed method. Specifically, motivated by several papers (e.g., Sogabe et al., 21 Tanio and Sugihara, 22 and van der Vorst and Ye 17 ), we take test matrices from the SuiteSparse Matrix Collection. 23 Table 4 lists the characteristics of the test matrices; for each matrix, it shows its dimension (n), the number of nonzero entries (nnz), the 2-norm condition number ( 2 (A)), and the application discipline. The condition number is displayed only if it is given in the above collection. Here, 14 matrices were employed, but we note that similar numerical results can be obtained for many other matrices. Table 5 shows the number of MVs required for the successful convergence of CGS, Bi-CGSTAB, Bi-CGStab2, Bi-CGstab(L), GPBi-CG, and GPBi-CGstab(L) for the test matrices. In the table, these methods are expressed as CGS, -STAB, -Stab2, -stab(L), GP-, and GP-stab(L), respectively. For each matrix, the smallest number of MVs is displayed in boldface with an underline, and the second one is displayed only with an underline. We set the maximum number of MVs to 2n, and the symbol † indicates that there was no convergence within this bound. Table 6 shows the explicitly computed relative residual 2-norm (||b − Ax k || 2 ∕||b|| 2 ) at termination. Here, the value with an underline denotes that ||b − Ax k || 2 ∕||b|| 2 ≥ 10 −9 holds despite the fact that the stopping criterion ||r k || 2 ∕||b|| 2 < 10 −12 is satisfied; that is, there is a large residual gap. The symbol ‡ indicates that the residual norms diverge or that breakdown occurs.
We here comment on the choice of the parameter L for Bi-CGstab(L) and GPBi-CGstab(L). As mentioned in Section 3.3, the residuals of the two methods belong to the same Krylov subspace after performing the same number of cycles. From this perspective, we regard Bi-CGstab(L) and GPBi-CGstab(L) as the homogeneous short-recurrence Krylov subspace methods, and we especially compare the convergences of them. However, in terms of the number of parameters, because GPBi-CGstab(L) performs residual minimization using L + 1 parameters at each cycle, it might also be fair to compare it with Bi-CGstab(L + 1) in which a residual is minimized over a L + 1 dimensional space. Therefore, although helps to enhance the convergence in GPBi-CGstab(L). Even when comparing GPBi-CGstab(L) and Bi-CGstab(L + 1), the number of MVs of the proposed method is smaller in many cases. The approximate solutions obtained by GPBi-CGstab(L) are sufficiently accurate for all the matrices, although there are cases where the conventional LTPMs have a large residual gap. We note that, in case there is a large residual gap in the proposed method, the refined techniques presented in, for example, Aihara et al., 15 Sleijpen and van der Vorst, 16 and van der Vorst and Ye, 17 may be useful to improve it. We also note that the modest value of L, such as L = 2, 3, and 4 above, is a suitable choice for the proposed method in terms of not only computational costs but also numerical stability. In our experience, choosing a larger L (e.g., L > 10) often leads to a loss of convergence speed.

Numerical example 3
Finally, we present numerical results with preconditioning. We use the preconditioned algorithms, which can be derived by applying the methods to the right-preconditioned system Ãx = b, where Ã = AK −1 andx = Kx for a preconditioner K. In CGS, Bi-CGSTAB, Bi-CGStab2, and GPBi-CG, we can update the approximations x k recursively, instead ofx k , through some changes of variables (cf. van der Vorst 4 ). Although Bi-CGstab(L) and GPBi-CGstab(L) can also be implemented to update the approximations x k , they require many additional vector updates for obtaining auxiliary vectors such as K −1 (AK −1 ) j r (j) k . Therefore, for the two methods, we use simpler algorithms that update the approximations x k , and x k = K −1x k is computed once when the updates are terminated. This approach can be easily implemented with lower computational costs and, in our experience, is slightly less affected by rounding errors than the former approach. Tables 7 and 8 show the number of MVs and the explicitly computed relative residual 2-norm, respectively, of the methods with preconditioning for the test matrices in Table 4. Here, ILU(0) 24 was employed as the preconditioner, and we refer to the previous subsection for other computational conditions. Note that the same number of multiplications by K −1 as the number of MVs are required for each preconditioned algorithm. From Tables 5 to 8, we can observe that the convergence of GPBi-CGstab(L) can be significantly enhanced by preconditioning, as in the conventional LTPMs. Then, the convergence speed and the attainable accuracy of the approximate solutions for the preconditioned GPBi-CGstab(L) compare favorably with those for the conventional LTPMs with preconditioning.