Methods for the polynomial eigenvalue problem sometimes need to be followed by an iterative refinement process to improve the accuracy of the computed solutions. This can be accomplished by means of a Newton iteration tailored to matrix polynomials. The computational cost of this step is usually higher than the cost of computing the initial approximations, due to the need of solving multiple linear systems of equations with a bordered coefficient matrix. An effective parallelization is thus important, and we propose different approaches for the message-passing scenario. Some schemes use a subcommunicator strategy in order to improve the scalability whenever direct linear solvers are used. We show performance results for the various alternatives implemented in the context of SLEPc, the Scalable Library for Eigenvalue Problem Computations. Copyright © 2016 John Wiley & Sons, Ltd.

This paper introduces a robust preconditioner for general sparse matrices based on low-rank approximations of the Schur complement in a Domain Decomposition framework. In this ‘Schur Low Rank’ preconditioning approach, the coefficient matrix is first decoupled by a graph partitioner, and then a low-rank correction is exploited to compute an approximate inverse of the Schur complement associated with the interface unknowns. The method avoids explicit formation of the Schur complement. We show the feasibility of this strategy for a model problem and conduct a detailed spectral analysis for the relation between the low-rank correction and the quality of the preconditioner. We first introduce the SLR preconditioner for symmetric positive definite matrices and symmetric indefinite matrices if the interface matrices are symmetric positive definite. Extensions to general symmetric indefinite matrices as well as to nonsymmetric matrices are also discussed. Numerical experiments on general matrices illustrate the robustness and efficiency of the proposed approach. Copyright © 2016 John Wiley & Sons, Ltd.

We present an analysis for minimizing the condition number of nonsingular parameter-dependent 2 × 2 block-structured saddle-point matrices with a maximally rank-deficient (1,1) block. The matrices arise from an augmented Lagrangian approach. Using quasidirect sums, we show that a decomposition akin to simultaneous diagonalization leads to an optimization based on the extremal nonzero eigenvalues and singular values of the associated block matrices. Bounds on the condition number of the parameter-dependent matrix are obtained, and we demonstrate their tightness on some numerical examples. Copyright © 2016 John Wiley & Sons, Ltd.

Large-scale scientific computing models are needed for the simulation of wave propagation especially for multiple frequency and high-frequency models in complex heterogeneous media. Multigrid methods provide efficient iterative solvers for many large sign-definite systems of equations resulting from physical models. Time-harmonic wave propagation models lead to sign-indefinite systems with eigenvalues in the left half of the complex plane. Thus standard multigrid approaches applied in conjunction with a low-order finite difference or finite element method are not sufficient. In this work, we describe a high-order finite element method model for multiple (low to high) frequency time-harmonic acoustic wave propagation on general curved, non-convex, and non-smooth domains with heterogeneous media using a multigrid approximation of the shifted Laplacian operator as a preconditioner. We implement the model using an efficient geometric multigrid approach with parallel grid transfer operator calculations to simulate the model using the BiCGStab iterative solver. We demonstrate the efficiency and parallel performance of the computational model with multiple low (5 wavelength) to high-frequency (100 wavelength) input incident waves. Copyright © 2016 John Wiley & Sons, Ltd.

Estimating the number of eigenvalues located in a given interval of a large sparse Hermitian matrix is an important problem in certain applications, and it is a prerequisite of eigensolvers based on a divide-and-conquer paradigm. Often, an exact count is not necessary, and methods based on stochastic estimates can be utilized to yield rough approximations. This paper examines a number of techniques tailored to this specific task. It reviews standard approaches and explores new ones based on polynomial and rational approximation filtering combined with a stochastic procedure. We also discuss how the latter method is particularly well-suited for the FEAST eigensolver. Copyright © 2016 John Wiley & Sons, Ltd.

The incompressible. Stokes equations are a widely used model of viscous or tightly confined flow in which convection effects are negligible. In order to strongly enforce the conservation of mass at the element scale, special discretization techniques must be employed. In this paper, we consider a discontinuous Galerkin approximation in which the velocity field is *H*(div,Ω)-conforming and divergence-free, based on the Brezzi, Douglas, and Marini finite-element space, with complementary space (*P*_{0}) for the pressure. Because of the saddle-point structure and the nature of the resulting variational formulation, the linear systems can be difficult to solve. Therefore, specialized preconditioning strategies are required in order to efficiently solve these systems. We compare the effectiveness of two families of preconditioners for saddle-point systems when applied to the resulting matrix problem. Specifically, we consider block-factorization techniques, in which the velocity block is preconditioned using geometric multigrid, as well as fully coupled monolithic multigrid methods. We present parameter study data and a serial timing comparison, and we show that a monolithic multigrid preconditioner using Braess–Sarazin style relaxation provides the fastest time to solution for the test problem considered. Copyright © 2016 John Wiley & Sons, Ltd.

This paper proposes a new, low-communication algorithm for solving PDEs on massively parallel computers. The range decomposition (RD) algorithm exposes coarse-grain parallelism by applying nested iteration and adaptive mesh refinement locally before performing a global communication step. Just a few such steps are observed to be sufficient to obtain accuracy within a small multiple of discretization error. The target applications are petascale and exascale machines, where hierarchical parallelism is required and traditional parallel numerical PDE communication patterns are costly because of message latency. The RD algorithm uses a partition of unity to equally distribute the error, and thus, the work. The computational advantages of this approach are that the decomposed problems can be solved in parallel without any communication until the partitioned solutions are summed. This offers potential advantages in the paradigm of expensive communication but very cheap computation. This paper introduces the method and explains the details of the communication step. Two performance models are developed, showing that the latency cost associated with a traditional parallel implementation of nested iteration is proportional to *l**o**g*(*P*)^{2}, whereas the RD method reduces the communication latency to *l**o**g*(*P*), while maintaining similar bandwidth costs. Numerical results for two problems, Laplace and advection diffusion, demonstrate the enhanced performance, and a heuristic argument explains why the method converges quickly. Copyright © 2016 John Wiley & Sons, Ltd.

In this paper, two accelerated divide-and-conquer (ADC) algorithms are proposed for the symmetric tridiagonal eigenvalue problem, which cost *O*(*N*^{2}*r*) flops in the worst case, where *N* is the dimension of the matrix and *r* is a modest number depending on the distribution of eigenvalues. Both of these algorithms use hierarchically semiseparable (HSS) matrices to approximate some intermediate eigenvector matrices, which are Cauchy-like matrices and are off-diagonally low-rank. The difference of these two versions lies in using different HSS construction algorithms, one (denoted by ADC1) uses a structured low-rank approximation method and the other (ADC2) uses a randomized HSS construction algorithm. For the ADC2 algorithm, a method is proposed to estimate the off-diagonal rank. Numerous experiments have been carried out to show their stability and efficiency. These algorithms are implemented in parallel in a shared memory environment, and some parallel implementation details are included. Comparing the ADCs with highly optimized multithreaded libraries such as Intel MKL, we find that ADCs could be more than six times faster for some large matrices with few deflations. Copyright © 2016 John Wiley & Sons, Ltd.

The triangular truncation operator is a linear transformation that maps a given matrix to its strictly lower triangular part. The operator norm (with respect to the matrix spectral norm) of the triangular truncation is known to have logarithmic dependence on the dimension, and such dependence is usually illustrated by a specific Toeplitz matrix. However, the precise value of this operator norm as well as on which matrices can it be attained is still unclear. In this article, we describe a simple way of constructing matrices whose strictly lower triangular part has logarithmically larger spectral norm. The construction also leads to a sharp estimate that is very close to the actual operator norm of the triangular truncation. This research is directly motivated by our studies on the convergence theory of the Kaczmarz type method (or equivalently, the Gauß-Seidel type method), the corresponding application of which is also included. Copyright © 2016 John Wiley & Sons, Ltd.

We consider the nonlinear eigenvalue problem *M*(*λ*)*x* = 0, where *M*(*λ*) is a large parameter-dependent matrix. In several applications, *M*(*λ*) has a structure where the higher-order terms of its Taylor expansion have a particular low-rank structure. We propose a new Arnoldi-based algorithm that can exploit this structure. More precisely, the proposed algorithm is equivalent to Arnoldi's method applied to an operator whose reciprocal eigenvalues are solutions to the nonlinear eigenvalue problem. The iterates in the algorithm are functions represented in a particular structured vector-valued polynomial basis similar to the construction in the infinite Arnoldi method [Jarlebring, Michiels, and Meerbergen, Numer. Math., 122 (2012), pp. 169–195]. In this paper, the low-rank structure is exploited by applying an additional operator and by using a more compact representation of the functions. This reduces the computational cost associated with orthogonalization, as well as the required memory resources. The structure exploitation also provides a natural way in carrying out implicit restarting and locking without the need to impose structure in every restart. The efficiency and properties of the algorithm are illustrated with two large-scale problems. Copyright © 2016 John Wiley & Sons, Ltd.

Several iterative methods for maximal correlation problems (MCPs) have been proposed in the literature. This paper deals with the convergence of these iterations and contains three contributions. Firstly, a unified and concise proof of the monotone convergence of these iterative methods is presented. Secondly, a starting point strategy is analysed. Thirdly, some error estimates are presented to test the quality of a computed solution. Both theoretical results and numerical tests suggest that combining with this starting point strategy these methods converge rapidly and are more likely converging to a global maximizer of MCP. Copyright © 2016 John Wiley & Sons, Ltd.

Some modulus-based matrix splitting iteration methods for a class of implicit complementarity problem are presented, and their convergence analysis is given. Numerical experiments confirm the theoretical analysis and show that the proposed methods are efficient. Copyright © 2016 John Wiley & Sons, Ltd.

The following paper discusses the application of a multigrid-in-time scheme to Least Squares Shadowing (LSS), a novel sensitivity analysis method for chaotic dynamical systems. While traditional sensitivity analysis methods break down for chaotic dynamical systems, LSS is able to compute accurate gradients. Multigrid is used because LSS requires solving a very large Karush–Kuhn–Tucker system constructed from the solution of the dynamical system over the entire time interval of interest. Several different multigrid-in-time schemes are examined, and a number of factors were found to heavily influence the convergence rate of multigrid-in-time for LSS. These include the iterative method used for the smoother, how the coarse grid system is formed and how the least squares objective function at the center of LSS is weighted. Copyright © 2014 John Wiley & Sons, Ltd.

No abstract is available for this article.

]]>The Jacobi–Davidson (JD) algorithm is considered one of the most efficient eigensolvers currently available for non-Hermitian problems. It can be viewed as a coupled inner-outer iteration, where the inner one expands the search subspace and the outer one reduces the eigenpair residual. One of the difficulties in the JD efficient use stems from the definition of the most appropriate inner tolerance, so as to avoid useless extra work and keep the number of outer iterations under control. To this aim, the use of an efficient preconditioner for the inner iterative solver is of paramount importance. The present paper describes a fresh implementation of the JD algorithm with controlled inner iterations and block factorized sparse approximate inverse preconditioning for non-Hermitian eigenproblems in a parallel computational environment. The algorithm performance is investigated by comparison with a freely available software package such as SLEPc. The results show that combining the inner tolerance control with an efficient preconditioning technique can allow for a significant improvement of the JD performance, preserving a good scalability. Copyright © 2016 John Wiley & Sons, Ltd.

In this paper, we propose a shifted symmetric higher-order power method for computing the H-eigenpairs of a real symmetric even-order tensor. The local convergence of the method is proved. In addition, by utilizing the fixed-point analysis, we can characterize exactly which H-eigenpairs can be found and which cannot be found by the method. Numerical examples are presented to illustrate the performance of the method. Copyright © 2016 John Wiley & Sons, Ltd.

In this paper, we study a class of tuned preconditioners that will be designed to accelerate both the DACG–Newton method and the implicitly restarted Lanczos method for the computation of the leftmost eigenpairs of large and sparse symmetric positive definite matrices arising in large-scale scientific computations. These tuning strategies are based on low-rank modifications of a given initial preconditioner. We present some theoretical properties of the preconditioned matrix. We experimentally show how the aforementioned methods benefit from the acceleration provided by these tuned/deflated preconditioners. Comparisons are carried out with the Jacobi–Davidson method onto matrices arising from various large realistic problems arising from finite element discretization of PDEs modeling either groundwater flow in porous media or geomechanical processes in reservoirs. The numerical results show that the Newton-based methods (which includes also the Jacobi–Davidson method) are to be preferred to the – yet efficiently implemented – implicitly restarted Lanczos method whenever a small to moderate number of eigenpairs is required. Copyright © 2016 John Wiley & Sons, Ltd.

This paper deals with studying some of well-known iterative methods in their tensor forms to solve a Sylvester tensor equation. More precisely, the tensor form of the Arnoldi process and full orthogonalization method are derived by using a product between two tensors. Then tensor forms of the conjugate gradient and nested conjugate gradient algorithms are also presented. Rough estimation of the required number of operations for the tensor form of the Arnoldi process is obtained, which reveals the advantage of handling the algorithms based on tensor format over their classical forms in general. Some numerical experiments are examined, which confirm the feasibility and applicability of the proposed algorithms in practice. Copyright © 2016 John Wiley & Sons, Ltd.

Generalized cross validation is a popular approach to determining the regularization parameter in Tikhonov regularization. The regularization parameter is chosen by minimizing an expression, which is easy to evaluate for small-scale problems, but prohibitively expensive to compute for large-scale ones. This paper describes a novel method, based on Gauss-type quadrature, for determining upper and lower bounds for the desired expression. These bounds are used to determine the regularization parameter for large-scale problems. Computed examples illustrate the performance of the proposed method and demonstrate its competitiveness. Copyright © 2016 John Wiley & Sons, Ltd.

The use of the fast Fourier transform (FFT) accelerates Lanczos tridiagonalisation method for Hankel and Toeplitz matrices by reducing the complexity of matrix–vector multiplication. In multiprecision arithmetics, the FFT has overheads that make it less competitive compared with alternative methods when the accuracy is over 10000 decimal places. We studied two alternative Hankel matrix–vector multiplication methods based on multiprecision number decomposition and recursive Karatsuba-like multiplication, respectively. The first method was uncompetitive because of huge precision losses, while the second turned out to be five to 14 times faster than FFT in the ranges of matrix sizes up to *n* = 8192 and working precision of *b* = 32768 bits we were interested in. We successfully applied our approach to eigenvalues calculations to studies of spectra of matrices that arise in research on Riemann zeta function. The recursive matrix–vector multiplication significantly outperformed both the FFT and the traditional multiplication in these studies. Copyright © 2016 John Wiley & Sons, Ltd.

A typical approach to decrease computational costs and memory requirements of classical algebraic multigrid methods is to replace a conservative coarsening algorithm and short-distance interpolation on a fixed number of fine levels by an aggressive coarsening with a long-distance interpolation. Although the quality of the resulting algebraic multigrid grid preconditioner often deteriorates in terms of convergence rates and iteration counts of the preconditioned iterative solver, the overall performance can improve substantially. We investigate here, as an alternative, a possibility to replace the classical aggressive coarsening by aggregation, which is motivated by the fact that the convergence of aggregation methods can be independent of the problem size provided that the number of levels is fixed. The relative simplicity of aggregation can lead to improved solution and setup costs. The numerical experiments show the relevance of the proposed combination on both academic and benchmark problems in reservoir simulation from oil industry. Copyright © 2016 John Wiley & Sons, Ltd.

We present a second-order expansion for singular subspace decomposition in the context of real matrices. Furthermore, we show that, when some particular assumptions are considered, the obtained results reduce to existing ones. Some numerical examples are provided to confirm the theoretical developments of this study. Copyright © 2016 John Wiley & Sons, Ltd.

Sums of Kronecker products occur naturally in high-dimensional spline approximation problems, which arise, for example, in the numerical treatment of chemical reactions. In full matrix form, the resulting non-sparse linear problems usually exceed the memory capacity of workstations. We present methods for the manipulation and numerical handling of Kronecker products in factorized form. Moreover, we analyze the problem of approximating a given matrix by sums of Kronecker products by making use of the equivalence to the problem of decomposing multilinear forms into sums of one-forms. Greedy algorithms based on the maximization of multilinear forms over a torus are used to obtain such (finite and infinite) decompositions that can be used to solve the approximation problem. Moreover, we present numerical considerations for these algorithms. Copyright © 2016 John Wiley & Sons, Ltd.

In this paper, we study a class of weakly nonlinear complementarity problems arising from the discretization of free boundary problems. By reformulating the complementarity problems as implicit fixed-point equations based on splitting of the system matrices, we propose a class of modulus-based matrix splitting algorithms. We show their convergence by assuming that the system matrix is positive definite. Moreover, we give several kinds of typical practical choices of the modulus-based matrix splitting iteration methods based on the different splitting of the system matrix. Numerical experiments on two model problems are presented to illustrate the theoretical results and examine the numerical effectiveness of our modulus-based matrix splitting algorithms. Copyright © 2016 John Wiley & Sons, Ltd.

In this paper, we consider the solution of a large linear system of equations, which is obtained from discretizing the Euler–Lagrange equations associated with the image deblurring problem. The coefficient matrix of this system is of the generalized saddle point form with high condition number. One of the blocks of this matrix has the block Toeplitz with Toeplitz block structure. This system can be efficiently solved using the minimal residual iteration method with preconditioners based on the fast Fourier transform. Eigenvalue bounds for the preconditioner matrix are obtained. Numerical results are presented. Copyright © 2016 John Wiley & Sons, Ltd.