Highly accurate special quadrature methods for Stokesian particle suspensions in confined geometries

Boundary integral methods are highly suited for problems with complicated geometries, but require special quadrature methods to accurately compute the singular and nearly singular layer potentials that appear in them. This paper presents a boundary integral method that can be used to study the motion of rigid particles in three-dimensional periodic Stokes flow with confining walls. A centrepiece of our method is the highly accurate special quadrature method, which is based on a combination of upsampled quadrature and quadrature by expansion (QBX), accelerated using a precomputation scheme. The method is demonstrated for rodlike and spheroidal particles, with the confining geometry given by a pipe or a pair of flat walls. A parameter selection strategy for the special quadrature method is presented and tested. Periodic interactions are computed using the Spectral Ewald (SE) fast summation method, which allows our method to run in O(n log n) time for n grid points, assuming the number of geometrical objects grows while the grid point concentration is kept fixed.


Introduction
Microhydrodynamics is the study of uid ow at low Reynolds numbers, also known as Stokes ow or creeping ow. Applications are found in biology, for example in the swimming of microorganisms [20] and in blood ow [34], as well as in the eld of micro uidics, which concerns the design and construction of miniaturized uid devices [51]. Suspensions of rigid particles in Stokes ow are important both in various applications and in fundamental uid mechanics [46,16,36,19]. In this paper, we describe a boundary integral method that can be used to study the motion of rigid particles of di erent shapes in Stokes ow. The particle suspension may also be con ned in a container geometry, such as a pipe or a pair of at walls. The ow in the uid domain (i.e. within the container but outside the particles) is governed by the Stokes equations, which for an incompressible Newtonian uid take the form Here, p is the pressure, u is the ow velocity, f is the body force per unit volume and µ is the viscosity of the uid. The Stokes equations arise as a linearization of the Navier-Stokes equations in the case where uid inertia can be neglected, i.e. when the Reynolds number is much less than 1.
On the surfaces of the particles and walls, no-slip boundary conditions are prescribed. A problem of physical interest is the resistance problem: given the velocities of all particles, compute the forces and torques (caused by viscous resistance) acting on them by the uid. The inverse problem is called the mobility problem: given the forces and torques acting on all particles by the uid, compute the particle velocities. The mobility problem is useful in the case of noninertial particles, since then the net force on each particle must be zero, so any external forces (such as gravity) must be balanced by viscous forces from the uid; given external forces and torques, one can then compute the motion of the particles.
Since the governing equations (1)- (2) are linear, boundary integral methods can be used to solve them. In these methods, the ow is expressed in terms of layer potentials, which are integrals over the boundary of the uid domain (i.e. over the container walls and particle surfaces). This reduces the dimensionality of the problem from three to two, and leads to a smaller linear system compared to methods that must discretize the whole volume (such as the nite di erence or nite element methods). It it also easy to move the particles, since no remeshing is needed. For a detailed discussion on the properties of boundary integral methods, we refer to the books by Pozrikidis [42], Atkinson [1] and Kress [30]. Of special importance are Fredholm integral equations of the second kind, which when discretized, for example using the Nyström method [1, ch. 4] [30, sec. 12.2], are known to remain well-conditioned as the system size increases [1, p. 113] [30, p. 282].
The linear system resulting from the discretization of a boundary integral equation is dense, and thus naive Gaussian elimination would require O(N 3 ) operations to solve a system of N unknowns. Using an iterative solution method such as the generalized minimal residual method (GMRES) [47], the complexity is reduced to O(N 2 ) since the condition number and thus the number of iterations are independent of the system size (but may depend on the geometry). For a large system, this complexity is still prohibitive. This can be overcome by using a fast summation method such as the fast multipole method (FMM) [17,18] or a fast Ewald summation method [13,32,27], which reduce the complexity further to O(N ) or O(N log N ), respectively.
One of the challenges of boundary integral methods is the need for accurate special quadrature methods for singular and nearly singular integrands. These are necessary when evaluating the layer potentials at a point on the boundary (where the integral kernel is singular) or close to the boundary (where the kernel is nearly singular, i.e. hard to resolve using a quadrature rule designed for smooth integrands). Such special quadrature methods are the main focus of this paper.

Overview of related work
In two dimensions, there are excellent special quadrature methods available, such as the one introduced by Helsing and Ojala [21], which has been adapted to simulations of clean [37] and surfactant-covered [39] drops in Stokes ow, as well as vesicles [4]. However, this method is based on a complex variable formulation and not easy to generalize to three dimensions.
In three dimensions, the development of an accurate and e cient special quadrature method is still an active research problem, especially in the nearly singular case. For an overview of methods that have been used, we refer to [29, sec. 1] and [45, sec. 1]. One of the most promising methods which is still under development is quadrature by expansion (QBX), rst introduced by Klöckner et al. [29] and Barnett [3] and applied to the Helmholtz equation in two dimensions. This method is based on the observation that the layer potentials are smooth all the way up to the boundary, and can therefore be locally expanded around a point away from the boundary. The expansions can be evaluated at a point closer to the boundary, or even on the boundary itself. The convergence theory of QBX was developed in [14], while [26] analyzed the error from the underlying quadrature rule used to compute the expansion coe cients. A strength of QBX is that it separates source points and target points; the source points enter only in the computation of the expansion coe cients, which can then be used to evaluate the layer potential in all target points within a ball of convergence. QBX has been applied to spheroidal particles in three-dimensional Stokes ow by af Klinteberg and Tornberg [25], using a geometry-speci c precomputation scheme to accelerate the computation of the coe cients.
A di erent approach that has been taken to accelerate QBX is to couple it to a customized FMM, which has been done in two dimensions [43,44,58] and more recently in three dimensions [59,60]. This coupling is a natural step to take since the FMM uses expansions of the same kind as QBX, but it requires nontrivial modi cations to the FMM. The resulting method has complexity O(N ) and works for any smooth geometry. The work published so far has been for the Laplace and Helmholtz equations, but it is likely to be extended to more kernels, including the ones needed for Stokes ow.
The QBX-FMM methods above all use global QBX, in which all source points are included in forming the local expansion. An alternative is local QBX, in which only source points that are close to the expansion centre are included. Yet another variant is found in [25], where all source points on a single particle is used when forming expansions close to that particle; we call this variant particle-global. Local QBX is typically combined with a patchbased discretization of the geometry. While it reduces the cost of the method, it also poses a challenge since the local layer potential from a single patch may not be as smooth as the global layer potential from the whole geometry (or a whole particle). Di erent versions of local QBX have been described in two dimensions [3,43] and three dimensions [49]. The latter paper also uses target-speci c expansions, which need only O(p) terms to obtain the same accuracy as a QBX expansion based on spherical harmonics with O(p 2 ) terms. However, they sacri ce the separation of source and target that is otherwise present in QBX. This separation is in principle necessary in the QBX-FMM methods, but also in these methods can target-speci c expansions be used to lower the computational cost of the method [60].
Some of the recent work have focused on automating the parameter selection based on a given error tolerance, resulting in the adaptive QBX method [28]. The results have so far not been generalized to three dimensions. There has also been work on a kernel-independent version of QBX, called quadrature by kernel-independent expansion (QB-KIX) [45] and meant to be combined with the kernel-independent FMM. The published work is in two dimensions, but a generalization to three dimensions is expected to follow.
Other methods, which are not based on QBX, have also been used successfully as special quadrature methods in three dimensions. One example is the "line interpolation method" introduced by Ying et al. [57]. In this method, a line is constructed through the target point, which is close to the boundary, and its projection onto the boundary. The layer potential is evaluated at points further away from the boundary along this line, and also at the projection point where the line intersects the boundary if a separate singular integration method is available. The value at the target point is then computed using interpolation along the line (or extrapolation if no singular integration method is available). Like QBX, the success of this method hinges on the fact that they layer potential is smooth in the domain, so that it can be well interpolated (or extrapolated). It has been applied to surfactant-covered drops [50] and vesicles [35] in three-dimensional Stokes ow. The extrapolatory method used in [34] falls into the same category. Other types of methods are based on regularizing the kernel and adding corrections [5,53,6,54], density interpolation techniques [40], coordinate rotations and a subtraction method [10], asymptotic approximations [11], analytical expressions available only for spheres [12] or oating partitions of unity [9,61,23]. Many of these methods are target-speci c, and their cost grows rapidly if there are many nearly singular target points.

Scope of this paper
In this paper, we present a boundary integral method based on the Stokes double layer potential, which can be used to solve the mobility and resistance problems for a system of rigid particles in incompressible three-dimensional Stokes ow, possibly con ned within a container geometry. Our formulation leads to a Fredholm integral equation of the second kind. We use QBX for singular integration, and a combination of QBX and upsampled quadrature for nearly singular integration. Our QBX implementation is based on the work by af Klinteberg and Tornberg [25], which we have extended to rodlike particles, plane walls and pipes (using particle-global QBX for the particles and local QBX for the two wall geometries). A precomputation scheme is used to greatly accelerate the QBX computations for all geometries. For this precomputation scheme to be feasible, we require that each particle or wall is rigid and has some degree of symmetry, such as axisymmetry or re ective symmetry. Nonetheless, we have chosen this route since the implementation is relatively simple compared to e.g. a QBX-FMM method. When container walls are present, we restrict ourselves to periodicity in all three spatial directions and use a fast Ewald summation method called the Spectral Ewald method [31,23,24] to accelerate computations. 1 In this situation our method scales as O(N log N ) in the number of unknowns N , assuming xed grid point concentration. The container geometry is restricted to a periodic straight pipe or a pair of periodic plane walls. Our contributions include: • The combined special quadrature method based on QBX and upsampling, which we have implemented for spheroidal and rodlike particles, plane walls and pipes. (The QBX implementation for spheroids is reused from [25]. Our initial work on QBX for plane walls is published in the conference proceedings [2].) • A strategy for experimentally selecting the parameters of the special quadrature method to meet a given error tolerance. We also demonstrate that the boundary integral method in full meets the given error tolerance and scales as O(N log N ).
• Construction of fully smooth rodlike particles. We demonstrate the e ect of smoothness on the convergence of the local expansions in this particular case.
• Derivation of a stresslet identity for an in nite pipe and a pair of in nite plane walls. This is used as an exact solution to test the special quadrature method.
• An outline of how streamlines can be e ciently computed for periodic problems using the Spectral Ewald method, by reusing data. (This idea was used, but not explicitly described, in [25].) • The so-called completion sources that appear in our formulation are distributed along the axis of symmetry of rodlike particles, and we have studied how the number of completion sources in uences the accuracy.

Organization of the paper
In section 2, we introduce the mathematical formulation of the problem, including the boundary integral formulation and the boundary integral equations for the resistance and mobility problems. In section 3, we describe the discretization of the geometry and the quadrature method, including the combined special quadrature. The details on our QBX method are then given in section 4, including the precomputation scheme. In section 5, we describe how periodicity is treated and how the special quadrature is combined with the Spectral Ewald method. Then, in section 6, our parameter selection strategy for the special quadrature is described and demonstrated. Numerical results are given in section 7, to demonstrate the accuracy and scaling of our method. Finally, in section 8, we demonstrate the e ect of nonsmooth geometries on the convergence. The appendices include a derivation of the stresslet identity for plane walls and pipes, details on the construction of the smooth rodlike particles, and a note on streamline computation.

Mathematical formulation
We consider two di erent kinds of problems: free-space problems and fully periodic problems. In a free-space problem, M particles (spheroids or rods) are located in a uid extending to in nity. We denote the uid domain by Ω and its boundary, i.e. the union of all particle surfaces, by Γ. The Stokes equations (1)-(2) with f = 0 hold in Ω, while no-slip boundary conditions are imposed on Γ. The unit normal vector n of Γ is de ned to point into the uid domain Ω, as shown in Figure 1 (a).
The geometry for (a) a free-space problem, and (b) a fully periodic problem. In (b), the primary cell is marked with a darker outline than the other cells. The geometry is shown only in the primary cell. The lattice of periodic cells lls the whole space; only a small part is shown here.
A fully periodic problem, on the other hand, is periodic in all three spatial directions. The primary cell is a box with side lengths B = (B 1 , B 2 , B 3 ), which is considered to be replicated periodically in all three spatial directions, as illustrated by Figure 1 (b). Let the number of particles in the primary cell be M. In this case we also allow a container consisting either of a pair of plane walls or a pipe. Only the geometry inside the primary cell is discretized, which means that Γ consists of the union of the M particle surfaces and the parts of the wall surfaces that lie in the primary cell. 2 The uid domain Ω lies within the container but outside the particles; the ow is thus external to the particles, but internal to the surrounding walls. The unit normal vector n of Γ is always de ned to point into Ω.
Below, we introduce our boundary integral formulation in the free-space setting, or for the primary cell without periodicity. Full treatment of the periodic problem is deferred to section 5.

Boundary integral formulation
Any ow eld u that satis es the Stokes equations (1)-(2) with f = 0 can be expressed in terms of integrals over the boundary of the uid domain Ω, as described for example by Pozrikidis [42, ch. 4] and Kim and Karrila [22, ch. 14-16]. The boundary integral formulation that we use is based on the Stokes double layer potential D, which in free space is given by 3 Here, Γ and n are as in Figure 1, and the double layer density q is a continuous vector eld de ned on Γ. The tensor kernel T in (3) is known as the stresslet. The potential D has a jump discontinuity as x passes over Γ. More speci cally, for x ∈ Γ it holds that [42, p. 110] For any closed Lyapunov surfaceΓ ⊆ Γ and any constant vectorq, the stresslet identity [42, p. 28] holds. We will use this identity as a test case for the special quadrature method in sections 6 and 7.1. In appendix A we show that a variant of (5) holds also for the wall geometries that we consider, despite them not being closed surfaces.
The double layer potential u(x) = D[Γ, q](x) is a solution to (1)- (2) in Ω for any continuous vector eld q. However, not every solution to (1)-(2) can be represented by a double layer potential alone; for instance, as noted in [41][42, p. 119], the force and torque exerted on any particle by the ow from a double layer potential will always be zero, whereas a Stokes ow in general can exert a nonzero force and torque on the particles (which is a central point of the resistance and mobility problems mentioned in section 1). This is related to the presence of a nontrivial nullspace of the operator q → D[Γ, q] for external ows, which can be immediately seen from the stresslet identity (5).
To remove the nontrivial nullspace and allow for nonzero forces and torques on the particles, we add a completion ow V, rst introduced by Power and Miranda [41]. The completion ow is also a solution to (1)-(2) and can be identi ed as the ow from a point force F and a point torque τ located at . It is given by where the stokeslet S and the rotlet R are given by 4 respectively. We call a pair (F , τ ) a completion source. Such completion sources are placed in the interior of every particle. Mathematically, one completion source per particle is su cient, but this may lead to numerical problems in some cases. In this paper, we allow for multiple completion sources to be distributed along a line segment within the particle; as we show in section 7.2.1, this is important for elongated particles. For the particle with index α, let F (α ) and τ (α ) be the net force and torque, respectively, exerted on the uid by the particle, and let (α ) c be the centre of mass of the particle. (For a noninertial particle, F (α ) and τ (α ) would be equal to the net external force and torque, respectively, acting on the particle.) The completion ow associated with particle α is then given by where N src is the number of completion sources per particle, V is given by (6), and a (α ) is a vector specifying the line segment along which completion sources are placed. The function C is given by Both the double layer potential D and the completion ow V have the property that they decay to zero as x → ∞. To be able to represent ows which do not decay, we add a background ow u bg , which is a known solution to (1)- (2) in the whole physical space, ignoring all particles and walls. The total ow u in the presence of particles and walls is thus written as where u d is a disturbance ow which is responsible for enforcing the no-slip boundary conditions on the solid boundary Γ. As x moves away from Γ, the disturbance ow u d should decay to zero, and the total ow should therefore approach the background ow u bg . The disturbance ow is written as where V (α ) is as in (8), and the double layer density q must be determined through the boundary conditions. Note that u d as given by (11) decays as x → ∞, and by the superposition principle it satis es (1)- (2). Also note that completion sources are placed inside the particles since the ow is external to the particles, but not inside the walls since the ow is internal to the walls (for details we refer to [42, sec. 4.5]). On the other hand, the double layer density q is de ned on the surfaces of both the particles and walls. The formulation (11) is complete, meaning that any ow which satis es (1)-(2) and decays as x → ∞ can be represented in this way.
To derive the fundamental boundary integral equation, which is used to determine the double layer density q in (11), we insert (11) into (10) and then let x ∈ Ω approach the solid boundary Γ. Enforcing no-slip boundary conditions on Γ yields, recalling the jump condition (4), The presence of the term −4πq(x), which is due to the jump condition, makes the boundary integral equation (12) a Fredholm integral equation of the second kind. The right-hand side U Γ is the pointwise velocity of the boundary Γ. We assume the walls to be stationary and the particles to move as rigid bodies. This means that, if we let Γ w be the union of all wall surfaces and Γ (α ) p the surface of particle α, where U (α ) RBM and Ω (α ) RBM are the translational and angular velocity, respectively, of particle α (with RBM denoting rigid body motion).
As mentioned in section 1, the viscous resistance that the particles experience from the uid is related to their velocities. In the resistance problem, the velocities (i.e. U (α ) RBM and Ω (α ) RBM for each particle) are speci ed in (12)- (13), while in the mobility problem, the viscous forces and torques (i.e. F (α ) and τ (α ) for each particle) are speci ed [42, p. 129]. The boundary integral equations resulting from these two problems are described in more detail below. In both cases, the resulting integral equation is discretized using the Nyström method, as described in section 3.

The resistance problem
In this case, the velocities U (α ) RBM and Ω (α ) RBM of all particles are known, while the corresponding forces F (α ) and torques τ (α ) are to be computed. Following [42, p. 130], the forces and torques are related to the unknown double layer density q by stipulating These relations are inserted into (12), which can then be rearranged as After solving this integral equation for q, the forces and torques can be computed using (14), and the ow eld can then be computed using (10)-(11).

The mobility problem
In this case, the force F (α ) and torque τ (α ) exerted on the uid by each particle (which for a noninertial particle are equal to the net external force and torque acting on the particle) are known, but not the particle velocities U (α ) RBM and Ω (α ) RBM . Following [42, p. 135], the velocities are related to the double layer density q by Here, |Γ (α ) p | is the surface area of Γ (α ) p , and while ω (α ) n are three linearly independent unit vectors which must satisfy 1 The boundary integral equation (12) can then be rearranged as where U Γ [q] is given by (13) but with U (α ) RBM and Ω (α ) RBM replaced by the expressions in (16) and (17), respectively. After solving (20) for q, the velocities can be computed using (16)- (17), and the ow eld can be computed using (10)-(11).

Discretization and quadrature
In order to solve the boundary integral equation (15) associated with the resistance problem, or the boundary integral equation (20) associated with the mobility problem, the integral operators in these equations must be discretized. This amounts to discretizing the double layer potential D, as well as the integrals occurring in relation (14) for the resistance problem, or relation (16)- (17) for the mobility problem. Following [25], we introduce the notation for the integral of the arbitrary function f over the surface Γ. We introduce a quadrature rule Q N called the direct quadrature rule, de ned by a set of N nodes x i ∈ Γ and weights w i ∈ R, i = 1, . . . , N . The details of this quadrature rule is speci ed in sections 3.1 and 3.2. Using the direct quadrature rule Q N , the integral in (21) can be approximated as We denote an integral quantity approximated by Q N with a superscript h, for example the double layer potential We then discretize (15) For the mobility problem, (20) becomes The superscript h on V (α ),h in (24) and U h Γ in (25) signi es that these quantities, while not integrals themselves, contain integrals -namely (14) or (16)-(17) -which are approximated using the direct quadrature rule Q N . In both cases, the resulting linear system is solved iteratively using the generalized minimal residual method (GMRES).
In this paper, we consider two distinct types of geometrical objects, namely particles and walls, as indicated by Figure 2. Particles are mobile rigid bodies immersed in the uid, while walls are stationary and surround the uid domain. We consider two types of particles: spheroids, which are given by a surface in local coordinates; and rods, which consist of a cylinder with rounded caps, described in appendix B. We also consider two types of walls, namely plane walls and pipes with circular cross section. Both wall geometries extend to in nity in the periodic setting, but we discretize only the part of each object that lies inside the primary cell.

Particle-global quadrature
Plane wall Pipe (b) Walls Local patch-based quadrature F 2. The geometrical objects considered in this paper are (a) particles (spheroids and rods) and (b) walls (plane walls and pipes). The quadrature rule is di erent for particles and walls.
The nature of the direct quadrature rule Q N is di erent for particles and walls: for particles, it is a particleglobal quadrature rule described in section 3.1, while for walls it is a local patch-based quadrature rule described in section 3.2. The special quadrature method for particles and walls is introduced in section 3.3.
It should be noted that all geometrical objects shown in Figure 2 are smooth, i.e. of class C ∞ . The construction of the smooth rod particles is described in appendix B. In section 8, we consider the e ect that a nonsmooth object would have on the convergence of the special quadrature method.

Direct quadrature for particles
The discretization and direct quadrature rule of the spheroids are exactly the same as in [25], while for the rods they are a slight variation of the former. Both kinds of particles are axisymmetric, and their parametrizations take this into account, with one parameter φ ∈ [0, 2π ) varying in the azimuthal direction and the other parameter θ ∈ [0, π ] varying in the polar direction. For instance, the spheroid (26) is parametrized using spherical coordinates as It is discretized using a tensorial grid with n θ × n φ grid points. For the polar direction, let (θ i , λ θ i ), i = 1, . . . , n θ , be the nodes and weights of an n θ -point Gauss-Legendre quadrature rule [38, sec. 3.5(v)] on the interval [0, π ]. For the azimuthal direction, let (φ j , λ φ j ), j = 1, . . . , n φ , be the nodes and weights of the trapezoidal rule on the interval [0, 2π ). Since the integrand is periodic on this interval, the trapezoidal rule has spectral accuracy in this case [56]. The resulting tensorial quadrature rule, called the direct quadrature rule of the spheroid, is where W sph (θ, φ) is the area element associated with the parametrization (27). The rod consists of a cylinder with rounded caps. While the surface is smooth everywhere, the grid is divided into three parts as shown in Figure 3. The reason for this is to be able to increase the resolution at the caps independently of the resolution at the middle of the rod. 5

F
3. The grid on the rods consists of three parts: two caps and a middle cylinder.
The rod is parametrized as where L is the length of the rod and R its radius. The shape functions ϱ(· ; L, R) : [0, π ] → [0, R] and β(· ; L, R) : [0, π ] → [− 1 2 L, 1 2 L] are described in appendix B. They are chosen such that θ ∈ [0, π /3] = I 1 and θ ∈ [2π /3, π ] = I 3 correspond to the two caps, while θ ∈ [π /3, 2π /3] = I 2 corresponds to the middle part. Each cap is discretized using n 1 ×n φ grid points, and the middle cylinder is discretized using n 2 ×n φ grid points, so the total grid has (2n 1 +n 2 )×n φ grid points. The trapezoidal rule is again used in the azimuthal direction. In the polar direction, a separate Gauss-Legendre quadrature rule is used for each of the three parts. The tensorial direct quadrature rule of the rod is thus (with n 3 = n 1 ) where (θ k i , λ k i ), i = 1, . . . , n k , are the nodes and weights of an n k -point Gauss-Legendre quadrature on the interval I k , and W rod (θ, φ) is the area element associated with (29).
The direct quadrature rules (28) and (30) are both particle-global in the sense that each particle is treated as a single unit, and the quadrature rule is applied to the particle as a whole. The quadrature rules has spectral accuracy for smooth integrands, i.e. it converges exponentially as the number of grid points increases.

Direct quadrature for walls
The wall geometries are present only in the periodic setting, and then only the part inside the primary cell needs to be discretized. For the plane wall, this part consists of a at rectangle of size L 1 × L 2 , which is divided into P 1 × P 2 at subrectangles, called patches. Each patch is discretized using a tensorial grid with n 1 × n 2 Gauss-Legendre grid points, as shown in Figure 4 (a). In each direction of the patch, an n d -point Gauss-Legendre quadrature rule is used with nodes and weights (s d i , λ d i ), i = 1, . . . , n d , d = 1, 2. The resulting tensorial direct quadrature rule of the patch is where W wall (s 1 , s 2 ) is the area element of the wall.  The part of the pipe in the primary cell consists of a cylinder with radius R c and length L c . Like the plane wall, it is divided into rectangular patches, but these are curved, as seen in Figure 4 (b). Apart from that, the discretization and quadrature rule are the same as for the plane wall. The direct quadrature rule of the pipe is thus also given by (31), the only di erence compared to the plane wall being the area element W wall and the parametrization (s 1 , s 2 ) → x.
The direct quadrature rule (31) is local in the sense that the wall is subdivided into smaller patches, and the quadrature rule is applied to each patch separately. The grid can be re ned in two di erent ways: by adding more grid points to each patch (which we call n-re nement), or by reducing the size of the patches and thus increasing their number (which we call P-re nement). Under n-re nement, the quadrature rule has spectral accuracy like the direct quadrature rule of the particles, while under P-re nement the quadrature rule is polynomially accurate with order determined by n 1 and n 2 .
3.3. Special quadrature: upsampled quadrature and quadrature by expansion (QBX) The double layer potential D given by (3) is challenging to compute using direct quadrature in two di erent situations, in both cases due to its kernel T . Firstly, when the evaluation point x is on Γ itself, T becomes singular at the point = x (we refer to this as the singular case or the onsurface evaluation case). The integral exists as an improper integral as long as Γ is a Lyapunov surface [42, p. 37], but clearly a special quadrature method of some sort is needed to compute it. Secondly, when x ∈ Ω is close to Γ, but not on Γ, T becomes very peaked and thus hard to resolve using the direct quadrature rule (we refer to this as the nearly singular case or the o surface evaluation case).
The singular case is always present when solving the boundary integral equation (12), while the nearly singular case occurs when particles are close to each other or close to a wall, and also if the ow eld (10)-(11) is to be computed close to a particle or wall. The latter situation is illustrated in Figure 5   5. Relative error in the centre plane when evaluating the stresslet identity (5) for two rod particles between a pair of parallel horizontal plane walls in a periodic setting, using the direct quadrature rule in (a) and the upsampled quadrature rule with upsampling factor κ = 2 in (b). Note that the error is still large very close to the particles and walls in (b). The density q is constant in this example, so the error in (a) and (b) comes entirely from the nearly singular behaviour of the stresslet T . are error estimates available for the Laplace and Helmholtz potentials in [26] and for the Stokes potential in [39]. To compute the double layer potential accurately close to a particle or wall, special quadrature is needed. Here, we consider two types of special quadrature: upsampled quadrature and quadrature by expansion (QBX).
Assuming that the density q itself is well-resolved on the grid, upsampled quadrature provides a partial solution for the nearly singular case. In upsampled quadrature, the double layer density q is interpolated onto a grid re ned by a factor κ in both directions, and the integral is then evaluated using direct quadrature on the ner grid. For the particle-global quadrature rules in section 3.1, the grid of the whole particle is re ned (increasing the number of grid points of each individual quadrature rule). The density is interpolated onto the ner grid using trigonometric interpolation in the azimuthal direction and barycentric Lagrange interpolation [7] in the polar direction. For the local patch-based quadrature rules in section 3.2, only the N P patches closest to the evaluation point x are re ned, using n-re nement (thus increasing the number of grid points on them); other patches are su ciently far away from the singularity that direct quadrature can be used. This is illustrated in Figure 6 for N P = 9. The re nement has spectral accuracy for both particles and walls. Since all geometrical objects are rigid, interpolation matrices can be precomputed.  6. Upsampled quadrature for a plane wall: the N P patches closest to the evaluation point (marked with ×) are re ned, here for N P = 9. The other patches are treated using direct quadrature without re nement.
As Figure 5 (b) shows, upsampled quadrature pushes the region where the error is large closer to the boundary Γ. However, the error will always be large very close to Γ no matter how large the upsampling factor κ is. To be able to achieve small errors arbitrarily close to Γ, we use a special quadrature rule speci cally designed for layer potentials with singular kernels, namely quadrature by expansion (QBX) [29,3]. The idea behind QBX is to make a local series expansion of the potential D in the uid domain, which converges rapidly since D is smooth all the way up to the boundary Γ. The expansion is made around a point c, called the expansion centre, which is inside the uid domain (i.e. not on Γ), and it can be used to evaluate the potential inside a ball around c called the ball of convergence, as shown in Figure 7. The expansion is valid even at the point where the ball touches Γ [14], and can therefore be used in the singular case as well as the nearly singular case. The application of QBX to the Stokes double layer potential D will be described in detail in section 4.
The idea behind QBX is to make a series expansion of the potential close to a particle (a) or wall (b), valid inside a ball of convergence shown as a blue disc. The expansion is also valid at the point where the ball of convergence touches the boundary.
In this paper, we use a combined quadrature strategy, where direct quadrature is used far away from the boundary, upsampled quadrature is used in an intermediate region, and QBX is used in a small region closest to the boundary, as illustrated in Figure 8. For each particle and wall in the geometry, the evaluation point x is classi ed into one of these three regions, and the contribution to the double layer potential D from that particle or wall is computed as follows: • If x is in the direct quadrature region, the double layer potential (3) is computed using direct quadrature (23) over the whole particle or wall, as described in sections 3.1 and 3.2.
• If x is in the upsampled quadrature region, the behaviour is di erent for particles and walls, as described above. For a particle, the density is upsampled globally on the whole particle surface and then integrated using direct quadrature on the ne grid. For a wall, the density is upsampled only on the N P patches closest to the evaluation point, while direct quadrature without upsampling is used on the other patches (as in Figure 6).
• If x is in the QBX region, the behaviour is similar to the upsampled quadrature region. For a particle, the density on the whole particle surface is used when computing the coe cients of the local expansion which is then used at the evaluation point (particle-global QBX). For a wall, only the density on the N P patches closest to the expansion centre c is used to compute the expansion (local QBX), while the contribution from other patches is computed using direct quadrature. In other words, the expansion is computed using a truncated wall, with N P determining the number of patches in the truncated wall. The di erence between particle-global and local QBX is described in more detail in section 4.2.
The total double layer potential at x is then retrieved using superposition, i.e. by summing the contributions from all particles and walls.

Direct quadrature
Upsampled quadrature QBX (b) F 8. The regions of the combined quadrature strategy, shown here for a rod particle in (a) and a plane wall in (b). Depending on the location of the evaluation point x, it is treated using direct quadrature, upsampled quadrature or QBX.
When using local QBX, the convergence rate of the local expansion will depend on the ratio between the distance from c to the wall and the distance from c to the edge of the truncated wall [49]. We have observed that N P = 1 is too low for the wall QBX region in our case, since the expansion centre may then be too close to the edge of the truncated wall. Setting N P = 9 seems to be su cient to remedy this, and increasing N P further has no e ect. We therefore x N P = 9 both for the QBX region and the upsampled quadrature region of plane walls and pipes, for the rest of this paper.
The distances from the surface at which to switch from one quadrature region to the next (i.e. direct quadrature, upsampled quadrature, QBX) are parameters to be set, and these will be discussed in section 6.

Quadrature by expansion for the Stokes double layer potential
In order to apply QBX to the Stokes double layer potential D given by (3), we need to be able to write down a local expansion of the potential. We use the same approach as in [25], which is summarized in section 4.1. The di erences between particles (for which particle-global QBX is used) and walls (for which local QBX is used) are summarized in section 4.2. Finally, the precomputation scheme which is crucial for accelerating the method is described in section 4.3.

Local expansion of the double layer potential
Instead of expanding the double layer potential D itself directly, we use the fact that D can be expressed in terms of the so-called dipole potential L using the relation [55,25] whereΓ is any subset of Γ. The dipole potential is the double layer potential of the Laplace equation and is de ned as The kernel of the dipole potential has a natural expansion based on the so-called Laplace expansion where Y m l is the spherical harmonics function of degree l and order m (de ned as in [14, Eq. (3.5)]), while (r x , θ x , φ x ) and (r , θ , φ ) are spherical coordinates of the points x and respectively, with respect to a chosen expansion centre c, as shown in Figure 9. The expansion (34) is valid as long as r x < r , i.e., it can be used for all x within the ball of radius r QBX = min ∈Γ r centred at c.
Illustration of the points x, and c, and the ball of convergence of (34).

Inserting (34) into (33) leads to the expansion
of the dipole potential, where the coe cients z lm [ρ] are given by These coe cients are complex-valued due to Y m l , but the dipole potential L itself is real. Since the spherical harmonics functions satisfy Y −m l = (Y m l ) * , the coe cients also satisfy z l,−m = (z lm ) * , where the asterisk denotes the complex conjugate. It is therefore enough to compute the coe cients for m ≥ 0. The expansion (35) is in fact valid also at the point where the ball in Figure 9 touchesΓ (where r x = r QBX ), as established in [14]. This means that the expansion can be used both for o surface evaluation (in the interior of the ball, where the double layer potential is nearly singular) as well as onsurface evaluation (at the point onΓ closest to c, where the double layer potential is singular).
Relation (32) allows us to express D using four dipole potentials with densities Each of the four dipole potentials is expanded using (35), with coe cients given by (36), which together with (32) provides a local expansion of the Stokes double layer potential. In practice, the expansion (35) must be truncated, which is done at l = l max = p QBX . This results in the approximation The coe cients z h lm, j = z h lm [ρ (j) ] are here computed using the upsampled quadrature rule introduced in section 3.3, with upsampling factor κ = κ QBX . Upsampling is needed since the integrand in (36) becomes quite peaked for large l. However, the cost of upsampling can be entirely hidden in a precomputation step, as explained in section 4.3. The number of coe cients that needs to be computed in (38) for each j is which takes into account that only coe cients with m ≥ 0 need to be computed directly.
If the expansion (35) is absolutely convergent, the terms must decay in magnitude as l → ∞. The size of the terms can be estimated using the bound where we have used the fact that [14, Eq.
For a single dipole expansion such as (38) with a xed j, the bound (40) with z lm = z h lm, j provides a way to estimate the decay of the terms and thus the truncation error of the truncated expansion. It is however not directly applicable to the Stokes double layer potential, which involves derivatives of dipole potentials as seen in (32). To estimate the truncation error for the Stokes double layer potential, we instead evaluate the error directly in the grid points, as explained in section 6.1.2.
In summary, to compute D[Γ, q](x) using QBX, the density q is rst upsampled to a ner grid with upsampling factor κ QBX and then converted into four dipole densities using (37). From these, four sets of dipole coe cients z h lm, j are computed using the direct quadrature rule on the re ned grid. The coe cients are used to evaluate the dipole potentials (38), from which the Stokes double layer potential D can be computed using (32). Note that the derivatives with respect to x in (32) can be computed analytically.
Since QBX can be used for both onsurface and o surface evaluation, it is useful to introduce one expansion for each grid point. For each grid point x i on the boundary, an expansion centre c + i is thus placed at a distance r QBX away from the boundary in the normal direction (i.e. in the uid domain). This expansion centre can be used to evaluate the double layer potential in a ball touching that grid point. In practice the balls of convergence of neighbouring expansion centres will overlap, and for a given evaluation point the closest expansion centre is used to evaluate the QBX potential.
For onsurface evaluation (but not o surface evaluation), we also use a second expansion centre c − i for each grid point, placed at a distance r QBX away from the boundary in the negative normal direction (i.e. outside the uid domain), as shown by Figure 10. The reason for this is that it signi cantly improves the convergence when solving the boundary integral equation using GMRES, since the spectrum of the discrete operator better matches that of the continuous operator, as was noted in [29,45,25]. Note that due to the jump condition (4), the correct value of the potential onΓ is the average of the values from the two sides: where D + is the limit from the uid domain and D − is the limit from the other side ofΓ. While using two expansions may seem to double the computational cost, the extra cost appears only in the precomputation step, as described in section 4.3, and thus does not a ect the cost of evaluation itself. There are two sources of error in the QBX approximation: the truncation error due to the fact that the expansion in (38) is truncated at l = p QBX , and the coe cient error (called the "quadrature error" in [14,25,26]) due to the fact that the coe cients (36) are computed using a quadrature rule with nite precision. These two errors are controlled by the following three QBX parameters: • The expansion radius r QBX , which is the distance from the expansion centre toΓ and also the radius of the ball in which the expansion is valid. Increasing r QBX makes the truncation error grow since the ball of convergence (and hence r x ) becomes larger, but the coe cient error decreases since the integrand in (36) becomes easier to resolve as r becomes larger.
• The expansion order p QBX , which governs the number of terms to be included in the sum in (38). Increasing p QBX makes the truncation error decrease since more terms are included, but the coe cient error grows since the integrand in (36) is harder to resolve for larger l.
• The upsampling factor κ QBX , which governs the amount of grid re nement when computing the dipole coefcients (36). Increasing κ QBX makes the coe cient error decrease since the resolution of the underlying quadrature rule increases.
A simple way to decrease both the truncation error and coe cient error is to increase p QBX and κ QBX simultaneously while keeping r QBX xed. We will continue to discuss how the QBX parameters should be selected to achieve a small overall error in section 6. For a more in-depth analysis, we refer to [14] for the truncation error, [26] for the coe cient error, as well as the summary in [25, sec. 3.5].

Global and local QBX
As was mentioned in section 1.1, a QBX method can be either (fully) global, particle-global or local, the di erence being which part of the boundary (i.e. which source points) to include when forming the local expansion. Here we use particle-global QBX for particles and local QBX for walls. In essence, the di erence between the three variants is whatΓ in section 4.1 is taken to be: • For a fully global QBX method, all grid points on the whole boundary are used to form the local expansion, i.e.Γ = Γ.
• For a particle-global QBX method, all grid points on a single particle are used to form the expansion, i.e.
p is the surface of the particle with index α.
• For a local QBX method, only the grid points which are close to the expansion centre are used to form the expansion. In our case, we chooseΓ to be the N P patches of the wall which are closest to the expansion centre, as shown in Figure 6. (The contribution from patches further away is not included in the expansion but computed using direct quadrature.) Note thatΓ may depend on the location of the expansion centre to be used, which in turn depends on the evaluation point. In principle, it is su cient to letΓ consist of the grid points close to the expansion centre (i.e. local QBX), since that is where the integrand becomes nearly singular; for grid points further away, direct quadrature can be used. The reason to extendΓ further is to improve the regularity of the layer potential that is being expanded, so that the expansion converges more rapidly. Indeed, in local QBX, the expanded layer potential consists of the contribution from a truncated part of the boundary, and may not be very smooth sinceΓ ends abruptly. However, the largerΓ is, the further away from the ball of convergence will the edge ofΓ be, and the less will it a ect the convergence of the expansion. We have observed that N P = 9 is su cient forΓ for the walls.
In particle-global QBX, the expanded layer potential has the contribution from a whole particle, which consists of a closed and smooth surface, so the potential from it should be smooth. In a fully global QBX, the expanded layer potential is the global potential, which is smooth if Γ is regular enough. Unlike the particle-global QBX, the fully global QBX quickly becomes expensive unless a fast method (such as the FMM) is used to compute the far-eld contribution. Therefore the fully global QBX variant is not used in this paper.
An advantage of the local and particle-global QBX variants over the fully global QBX is that, if the individual particles and walls are rigid,Γ is the same (in local coordinates) for all particles or patches of the same shape, even if they have di erent orientations. This makes precomputation possible, which we shall return to in section 4.3. Another advantage is that expansion centres can be placed without regards to other particles or walls, since each expansion contains only the contribution from a single particle or wall segment. In a fully global QBX method, each ball of convergence must be completely outside all particles and walls, which would complicate the placement of the expansion centres.

Precomputation for QBX
The mapping given by (37) and the discrete version of (36), which takes the double layer density q onΓ and returns the dipole coe cients z h lm, j for a single expansion centre c i , is a linear function of q and can therefore be represented by a matrix M i . This matrix is of size 4N QBX × 3Ñ , where N QBX is given by (39) andÑ is the number of grid points onΓ (before upsampling). There is one such matrix M i for every expansion centre, and it depends only on the geometryΓ, its discretization and the location of the expansion centre in the local coordinates ofΓ. For a rigid geometryΓ, such as in our case, the matrix M i can therefore be precomputed and stored.
Note that the upsampling factor κ QBX is e ectively "hidden" in this precomputation step: upsampling in uences the computation of M i since q is upsampled before being inserted into (37), but it has no e ect on the size of M i , which is set by the discretization ofΓ prior to upsampling. Therefore, upsampling does not a ect the computational complexity of the method once M i has been precomputed.
The matrix M i which computes the coe cients z h lm, j is used for o surface evaluation, when the evaluation point is not known beforehand; the coe cients can then be used to evaluate the expansion at any evaluation point within the ball of convergence. For onsurface evaluation, i.e. evaluation at one of the grid points of the boundary, the evaluation point itself is known beforehand and precomputation can be taken even further. In fact, the mapping that takes the expansion coe cients to the value of the potential D[Γ, q](x i ), given by (38) and (32), is also linear and can therefore be represented by a matrix S i . This allows us to compute a matrix R i = S i M i which maps the density q onΓ directly to the value of the double layer potential D at one of the grid points -e ectively representing a set of target-speci c quadrature weights for every grid point. The matrix R i is of size 3 × 3Ñ and there is one such matrix for each grid point x i on the boundary. Precomputing the R i matrix hides not only κ QBX but also p QBX .
Since two expansion centres are used for onsurface evaluation, as the reader may recall from Figure 10, there are actually two R i matrices for each grid point: (42), it is clear that these matrices can be combined as to form a single matrix R i for each grid point. This way, the extra cost of using two expansions is completely hidden in the precomputation step. For the particles, the axisymmetry can be used to vastly reduce the amount of computations and storage needed to precompute the matrices M i and R i . In fact, due to re ective symmetry, it su ces to compute R i for the n θ /2 grid points (n 1 + n 2 /2 grid points for rod particles) shown in Figure 11, and M i for the corresponding expansion centres. The matrices for all other grid points and their expansion centres are then calculated using rotations and re ections, as in [25]. Note that if several particles of the same shape appear in a simulation, the precomputation only needs to be done for one such particle. F 11. The matrices M i and R i need only be stored for the grid points along half a line of longitude, here indicated with red dots.
For a wall geometry with uniform patch size, as in our case, the geometry has a discrete translational symmetry for o sets equal to the patch size, due to periodicity. This means that the geometry "looks" exactly the same seen from any patch of the wall, and it is therefore enough to precompute the M i and R i matrices for the n 1 n 2 grid points and corresponding expansion centres of a single patch of the wall. In this caseΓ consists of that patch and its N P − 1 closest neighbours, as indicated in Figure 6 for N P = 9.

Periodicity and fast methods
Up to this point we have not taken periodicity into account in the description of the mathematical formulation and its discretization; it is now time to remedy this. We will here give the details of the periodic formulation indicated in Figure 1 (b), and in particular focus on how the special quadrature methods are combined with the fast summation method used for the periodic problem.
Consider a primary cell with side lengths B = (B 1 , B 2 , B 3 ) which is replicated periodically in all three spatial directions. The ow eld is then periodic, i.e. u(x) = u(x +k · B) for any k ∈ Z 3 . This changes the boundary integral formulation introduced in section 2.1 in the following way: The layer potential D and completion ow V (α ) which appear in the ow eld (11) and in the fundamental boundary integral equation (12) are replaced by their periodic counterparts D 3P and V (α ),3P . These are de ned as in nite sums over the periodic lattice, i.e.
These sums converge slowly, and their value depends on the order of summation, so they cannot be computed using direct summation. We compute them using the Spectral Ewald (SE) method [31,32], a fast Ewald summation method based on the fast Fourier transform (FFT). The SE method is described in detail for the stokeslet in [31], for the stresslet in [23] and for the rotlet in [24], and has been combined with QBX previously in [25]. In the SE method, each of the periodic sums in (44) is split into two parts: the real-space part, which decays fast and can therefore be summed directly in real space; and the Fourier-space part, which is smooth and therefore decays fast in Fourier space.
No special treatment is needed for the completion ow V (α ),3P since the evaluation point is never close to the singular points (which are inside the particle), so the SE method as described in [31,24] is used without modi cation. For the double layer potential D 3P , special quadrature is needed so SE must be combined with QBX and the upsampled quadrature rule. How this is done is described below.
The periodic sum for the double layer potential can be written explicitly as The stresslet T is split into two parts where T R is the real-space part and T F is the Fourier-space part. The Ewald parameter ξ is a positive number which is used to balance the decay of T R in real space and the decay of the Fourier coe cients of T F (a larger value of ξ makes the real-space part decay faster and the Fourier-space part decay slower, thus shifting computational work into Fourier space). In the split that we use, T R is given by [23,25] where r = |r |. The Fourier-space part is simply given by T F = T − T R . Inserting (46) into (45) splits the periodic double layer potential into two parts The singularity of the stresslet is completely transferred to T R , while T F is nonsingular [25]. The Fourier-space part (49) is computed using FFTs as described in appendix D and [23]. The real-space potential (48) is evaluated in real space, and requires special quadrature due to the singularity of T R , much as in the free-space setting. Note that since T R (r ; ξ ) decays fast as |r | → ∞ it can be neglected for |r | > r c , where r c is called the cuto radius. We can thus change the integration domain in (48) to Γ = Γ (x, k; r c ) = { ∈ Γ : |x + k · B − | ≤ r c } and approximate The error of this approximation is determined by the product ξr c as described in [23]. Rather than deriving a new QBX expansion from scratch for the real-space part D 3P,R , we reuse the expansion of the total layer potential D from section 4. To be able to do this, we insert Note that the integration domain Γ ensures that both of these sums have few terms since r c should be smalltypically smaller than the size of the periodic cell. The integral in (51) represents the total layer potential from Γ and can thus be computed using the combined special quadrature method from section 3.3, with truncation at r c . The integral in (52) is computed using direct quadrature, which is possible since T F is nonsingular. As in the free-space setting in section 3.3, the evaluation point x is classi ed into one of three regions (see Figure 8). The potential is evaluated using (50) in the direct quadrature region and (51)- (52) in the other two regions. The reader may wonder why we in the upsampled quadrature region do not simply evaluate (50) using upsampled quadrature. The reason is that (52) evaluated using the same quadrature method as in the Fourier-space part-i.e. direct quadrature-is needed to cancel discretization errors in the latter. 6 These discretization errors may be larger than the SE error tolerance and are caused by the fact that T F (r ; ξ ), while nonsingular, tends to become slightly peaked for small |r |, i.e. close to Γ . Cancellation prevents these errors from in uencing the error of the full method.
Another important point to note is that r c must be chosen large enough so that no special quadrature is needed for the total potential when |r | > r c . This is because, for |r | > r c , the total potential is equal to the Fourier-space part, which is always computed using direct quadrature. Thus, r c must be at least as large as the distance from Γ to the direct quadrature region.

Parameter selection
In this section, we develop our strategy for selecting the parameters of the combined special quadrature, i.e. upsampled quadrature and QBX, when evaluating the Stokes double layer potential D. We assume that a discretization of the geometry is given, with su cient resolution for the density to be well-resolved and the direct quadrature to achieve a given error tolerance ε tol at a given distance (su ciently far away) from all surfaces. The goal is to select quadrature parameters for each particle and wall so that the error tolerance ε tol is achieved also in the upsampled quadrature region and QBX region. Of course, there are many di erent ways to choose the parameters, some resulting in higher computational e ciency than others. Here, we do not aim to optimize the e ciency; instead, our focus is on achieving the given error tolerance at an acceptable (albeit not optimal) computational cost.
The parameters that must be selected are shown in Figure 12. Note that we allow for multiple upsampled quadrature regions with di erent upsampling factors κ Ui , in order to gradually increase the upsampling closer to the surface. Due to the precomputation scheme for QBX, using QBX may in fact be faster than using upsampled quadrature with the same upsampling factor. Therefore, the QBX region may extend further away from the surface than the expansion centre (i.e. d QBX may be larger than r QBX , but of course not larger than 2r QBX ).
The parameters to be selected are as follows: • The threshold distances d Ui for the upsampled quadrature regions, i = 1, 2, . . . , N U , and the threshold distance d QBX for the QBX region. These determine at what distance from the surface each region starts. If d Γ (x) is the distance from the evaluation point x to the surface Γ, then x belongs to the ith upsampled quadrature region if and x belongs to the QBX region if d QBX ≥ d Γ (x) ≥ 0. Each of these distances should be chosen so that the error does not exceed the tolerance in the region further away from the surface (for example, d U1 is selected based on the direct quadrature error).
Direct quadrature Upsampled quadrature QBX F 12. Parameters of the combined special quadrature, here for a rod with N U = 3 upsampled quadrature regions (each geometrical object has its own set of corresponding parameters). Note that while the ball of convergence for QBX can be larger than the QBX region, the expansion is used only inside the QBX region.
• The upsampling factors κ Ui for the upsampled quadrature regions, i = 1, 2, . . . , N U . These should be increasing, i.e. κ U1 < κ U2 < . . . < κ UN U . The upsampling factor κ Ui determines the distance d U(i + 1) at which the next region must begin, which we will come back to in section 6.1.
• The QBX upsampling factor κ QBX , which controls the amount of upsampling used when computing the coecients in (36), and thus the QBX coe cient error as mentioned in section 4.1. It should be chosen large enough so that the coe cient error and the truncation error are balanced. The upsampling factor in uences the QBX precomputation time (which grows like O(κ 2 QBX )), but not the size of the precomputed M i and R i matrices, and thus not the evaluation time.
• The QBX expansion order p QBX , which controls the number of terms included in the expansion in (38), and thus the QBX truncation error as mentioned in section 4.1. It should be chosen so that the truncation error is below the error tolerance everywhere in the QBX region. The expansion order a ects the size of the M i matrix used for o surface evaluation (which grows like O(p 2 QBX )), but not that of the R i matrix used for onsurface evaluation. It should be noted that as p QBX increases, the upsampling factor κ QBX must also increase since higher-order coe cients are harder to resolve.
• The QBX expansion radius r QBX , which a ects both the coe cient error and the truncation error, but neither the precomputation time nor the evaluation time directly. It should typically be chosen as small as possible, since this speeds up the convergence of the expansion in (38) so that p QBX can be chosen small. On the other hand, a very small r QBX means that the upsampling factor κ QBX must be large, since the expansion centre moves closer to the surface.
However, in our implementation the primary restriction on r QBX is that it must be large enough for the balls of convergence to cover the QBX region su ciently well. In general, r QBX should not be smaller than the distance from one expansion centre to the next, to ensure a good coverage. Since we have one expansion centre per grid point, we require that r QBX should not be smaller than the grid spacing. 7 Letting h be some measure of the grid spacing (for example the largest distance between neighbouring grid points on the surface), it is useful to consider the ratio r QBX /h when selecting parameters. As noted in [25], this has the advantage that if r QBX /h is kept xed during re nement of the original grid, then the coe cient error is constant, assuming that the upsampling factor κ QBX is also xed. We will therefore consider r QBX /h in the rest of this section.
Unfortunately, no general error estimates are available in three dimensions for the quadrature rules that we use here. The parameters must therefore be selected based on numerical experiments, and we present a strategy for doing so here. The idea is to start from the outermost upsampled quadrature region (U1) and then proceed inwards towards the surface of the particle or wall, determining the parameters in the following order: 1. Threshold distances and upsampling factors for the upsampled quadrature regions, 2. The QBX parameters r QBX /h and d QBX , 3. The QBX parameters p QBX and κ QBX .
The process must be repeated for each type of particle and wall to be used. We develop the strategy in the context of a speci c rod particle in section 6.1; a summary of the parameter selection strategy in the general case follows in section 6.2. In sections 6.3 and 6.4 we apply the strategy to two more examples (a rod with a higher aspect ratio and a plane wall).
In order to estimate the error during the parameter selection process, we apply a constant densityq such that |q| = 1 to the surface and evaluate the stresslet identity (5). 8 This may seem like an overly simple test case since both the density and the solution are constant. However, the QBX expansions are not of the constant double layer potential D itself, but of the four dipole potentials L de ned by (32), and these are not constant. In practice, the stresslet identity seems to provide a decent test case for both direct quadrature, upsampled quadrature and QBX, as shown by the results in section 7.2 where the density is not constant. The Spectral Ewald parameters ξ and r c will not be discussed at length here, but we note that the requirement that no special quadrature be needed for |r | > r c implies that r c must be at least as large as d U1 . The Spectral Ewald error is determined by the product ξr c in real space and ξh F in Fourier space, where h F is the grid spacing of the uniform grid used for the Fourier-space part (see appendix D). Given a tolerance ε tol , the parameters ξ , r c and h F must satisfy the system ξr c = A(ε tol ), ξh F = B(ε tol ), where A and B are known functions. This leaves one degree of freedom which can be used to minimize the computational cost, albeit under the constraint r c ≥ d U1 . For a general discussion on the selection of Spectral Ewald parameters, including the functions A and B, we refer to [23] for the stresslet, [31] for the stokeslet, and [24] for the rotlet.
6.1. Introductory example: a rod particle with low aspect ratio In this rst example, we consider a rod particle of length L = 2 and radius R = 0.5 (i.e. aspect ratio 2), shown in Figure 13. The grid used for the direct quadrature has parameters n 1 = 40, n 2 = 10 and n φ = 25 (introduced in section 3.1), for a total of 2250 grid points. To give an idea of the error associated with the direct quadrature, we apply the constant densityq = (1, 1, 1)/ √ 3 to the particle surface and compute the stresslet identity (5) using direct quadrature in two planes intersecting the particle. The absolute error in these planes is shown in Figure 13.  13. Error in two perpendicular slices (a) and (b) through the rod particle, when evaluating the stresslet identity (5) using direct quadrature in free space.
To determine how the error varies with the distance to the surface, we evaluate (5) along several normal lines centred on grid points of the particle; due to the symmetry of the error it is enough to consider the n 1 + n 2 /2 = 45 lines shown in Figure 14 (a). The error along these lines is shown in Figure 14 (b). Given an error tolerance ε tol , the smallest distance at which the error does not exceed ε tol can be determined numerically. This distance is taken as d U1 . In this example, we will use the error tolerance ε tol = 10 −10 . As indicated in Figure 14 (b), the error reaches 10 −10 at d U1 = 1.061; special quadrature must be used within this distance to the surface. Having established the rst threshold distance d U1 , we now proceed to determine the rest of the parameters for the upsampled quadrature regions, in section 6.1.1.  14. Error of the direct quadrature along 45 normal lines centred on grid points of the particle surface. In (a), the normal lines are shown coloured in groups of ten. In (b), the error along each line is shown with the same colours (here, the blue and purple curves are obscured by the red curve). The line with the greatest error is coloured red; the error along this line reaches the value 10 −10 at distance 1.061 from the surface. (The smallest distance to surface included here is 0.01.)

Parameters for the upsampled quadrature regions
For the sake of simplicity we will always choose the upsampling factors to be κ Ui = i + 1, meaning that the rst upsampling factor will be κ U1 = 2, the next will be κ U2 = 3 and so on (this may not be the optimal strategy with regard to computational cost, but recall that our goal is not to optimize for computational e ciency). In order to determine the threshold distance d Ui of every upsampled quadrature region, we repeat the investigation from Figure 14 for di erent upsampling factors κ = 1, 2, 3, . . ., computing the stresslet identity error as a function of the distance to the surface for each upsampling factor. The maximal error at each distance is shown in Figure 15 (κ = 1 corresponds to Figure 14). The threshold distance d U(i + 1) is now taken as the distance at which the error curve corresponding to κ = κ Ui intersects the error tolerance ε tol (i = 1, 2, . . .). For instance, in this case the curve corresponding to κ = κ U1 = 2 intersects ε tol = 10 −10 around 0.391 = d U2 . This procedure sets all of the parameters for the upsampled quadrature, as shown in Table 1. However, at some point we must switch from upsampled quadrature to QBX, which is determined by the QBX threshold distance d QBX .
Selecting d QBX will also x the number of upsampled quadrature regions N U . We will determine d QBX together with the other QBX parameters in section 6.1.2. T 1. Parameters for the upsampled quadrature regions, tolerance 10 −10 .

Parameters for the QBX region
To understand how the QBX error behaves, we plot the o surface error from a single expansion in Figure 16 (a). Note that the QBX parameters used in this gure are not yet selected to achieve the error tolerance, but meant only to demonstrate the general behaviour of the error. Since the QBX error is the largest at the boundary of the ball of convergence (outside this ball the direct quadrature error is shown in Figure 16 (a)), it is su cient to measure the error at a point on this boundary, for example at the point where the ball touches the particle. Thus, we measure the QBX error at all the grid points of the rod -the onsurface error -shown in Figure 16 (b) for these particular QBX parameters.  16. Error when evaluating the stresslet identity (5) using quadrature by expansion with the parameters r QBX = h = π /25, p QBX = 18 and κ QBX = 15: (a) in a slice through the particle centre (i.e. o surface), using a single QBX expansion centre and direct quadrature outside the ball of convergence; (b) in the grid points of the particle (i.e. onsurface). Note in (b) that the error seems to be related to the curvature of the boundary; in particular the error is larger in areas where the curvature changes, viz. in the smooth transition from cylinder to cap. (Similar observations related to the convexity of the boundary were reported in [3] and [29].) For particles, we de ne the grid spacing h as the distance between grid points in the azimuthal direction at the equator of the particle, i.e.
where n φ is the number of grid points in the azimuthal direction (as de ned in section 3.1), R is the radius of the rod and a is the equatorial semiaxis of the spheroid (which appears in (26)). For the rod that we consider in this example, R = 0.5 and n φ = 25, so h = π /25 ≈ 0.1257. We now focus on selecting the parameters r QBX /h, p QBX and κ QBX such that the error is bounded by ε tol in the whole ball of convergence, for all QBX expansions of the particle. To do this, we consider the maximal onsurface error as we vary these three parameters, shown in Figure 17. As seen in Figure 17 (a), r QBX /h should be chosen as small as possible since this improves the decay of the truncation error as p QBX grows; if r QBX /h is small, p QBX can also be chosen small, which is important since the o surface evaluation time for QBX grows as O(p 2 QBX ). On the other hand, as Figure 18 shows, r QBX must not be too small compared to h, since then the balls of convergence would not overlap properly, and large areas of the QBX region would not be covered by any ball of convergence. 9 For this reason we require that r QBX ≥ h. In fact, since r QBX should be as small as possible, we will always set r QBX = h, so that r QBX /h = 1. To select p QBX and κ QBX , we use the data shown in Figure 17 (b), which is for r QBX /h = 1. As can be seen there, the truncation error is independent of κ QBX and depends only on p QBX , so we simply select the smallest p QBX such that the truncation error is below the tolerance. 10 Then we select the smallest κ QBX (restricted to multiples of ve for convenience) such that the coe cient error is no larger than the truncation error (i.e. such that the minimum point of the error curve is to the right of the selected p QBX ). For example, for ε tol = 10 −10 , we must choose p QBX = 40 and κ QBX = 20.
It remains to choose the threshold distance d QBX , which determines the extent of the QBX region as shown in Figure 18. Clearly d QBX cannot be larger than 2r QBX since then the balls of convergence would not reach the edge of the QBX region. Even with d QBX = 2r QBX , there would be areas in the QBX region, close to its edge, that would not be inside any ball of convergence. To mitigate this problem, we introduce a safety factor γ , derived in appendix C, and require that where γ = 0.85. As long as d QBX satis es (56), it can be chosen arbitrarily, in the sense that its value will not a ect Surface Γ r QBX h d QBX QBX region F 18. Balls of convergence with the parameters r QBX and d QBX and the grid spacing h marked. 9 Some areas of the QBX region will inevitably fall outside every ball of convergence no matter how large r QBX is. However, these areas are mainly very close to the surface but not at the grid points, where it is typically not necessary to evaluate the layer potential. 10 The dashed curves in Figure 17 where ρ = r QBX /h. This estimate was constructed for the rod particle in this particular example by applying curve tting to data from a parameter study similar to that shown in Figure 17 itself. Unfortunately, this experimental estimate is of limited use in the parameter selection process since it would have to be reconstructed for every new geometrical object (such as a rod with a di erent aspect ratio), while the data used to construct it can just as well be used directly to select p QBX .  Table 1, 169 is the largest threshold distance in this interval, and thus we select d QBX = 0.169 which means that N U = 3 upsampled quadrature regions are used.

Veri cation of selected parameters
To summarize, the parameters that were selected above for the rod in this example was, with error tolerance 10 −10 , If the selected QBX upsampling factor κ QBX seems large, recall that this parameter is completely hidden in the QBX precomputation step, as explained in section 4.3. To verify that the selected parameters keep the error below the tolerance, we plot in Figure 19 (a) the maximum error along the 45 lines that were used earlier (shown in Figure 14 (a)).
We also plot the error in two slices in Figure 20. The fact that the error slightly exceeds the tolerance at some points in (b) should come as no surprise, since we have used the error only along certain lines to select the parameters, not in the whole space. All the points where the tolerance is exceeded are close to the boundary between di erent quadrature regions and could thus be eliminated by adjusting the threshold distances slightly upwards (which we will however not do here). The parameter selection procedure is repeated for the same rod with the looser error tolerance 10 −6 . The parameters for tolerance 10 −6 are r QBX /h = 1, and the error is shown in Figure 19    21. Error in two perpendicular slices (a) and (b) through the rod particle, when evaluating the stresslet identity (5) using combined special quadrature with tolerance 10 −6 in free space. Each slice consists of 500 × 500 evaluation points. All points in (a) are below the tolerance, but 90 points in (b), marked red, are above the tolerance. The largest error is 9.214 × 10 −7 in (a) and 1.079 × 10 −6 in (b).

A note on the computational cost
While our parameter selection strategy does not try to optimize the computational cost, we naturally strive for a reasonably low cost. We therefore comment on the computational cost for the di erent quadrature methods considered here. The computational complexity for evaluating the layer potential using each quadrature method is shown in Table 2. The precomputation time (for constructing the interpolation matrices and QBX matrices), which is naturally independent of the number of evaluation points, is not included. Note that the total evaluation time depends on the number of evaluation points in each quadrature region, and also on the number of expansions that are used for the evaluation points in the QBX region (recall that the closest expansion centre is used for each evaluation point).
An example of evaluation times for a speci c computer machine is given in Table 3, again excluding precomputation. The time required to nd the closest expansion centre for each evaluation point in the QBX region has been omitted from Tables 2 and 3 since it is negligible (around 10 −8 × N eval,QBX seconds).
For a particle, N grid is the number of grid points on the whole particle, i.e. N grid = 2250 for the rod that we have considered so far. Let us study the special case of a single evaluation point, relevant for example when computing a streamline. Based on Table 3, the evaluation time for this single point can be computed, depending on which quadrature region the point belongs to and the parameter κ Ui or p QBX . This is shown in Table 4. From this, it can for example be seen that the evaluation takes roughly 1000 times longer for a point in the upsampled quadrature region with κ Ui = 2 compared to the direct quadrature region. (The upsampled quadrature cost is in this case completely dominated by interpolating the density, i.e. multiplying it by the precomputed interpolation matrix.) T 2. Computational complexities for the di erent quadrature methods.
Direct quadrature Evaluate Time complexities for evaluating the double layer potential D, excluding precomputation time. Here, • N grid is the total number of grid points on the part of the surface included in the special quadrature method (i.e.Γ as de ned in section 4.2), • N eval,D , N eval,Ui and N eval,QBX are the number of evaluation points in the direct quadrature region, the ith upsampled quadrature region and the QBX region, respectively, • N exp is the number of expansion centres that are to be used for the evaluation points in the QBX region. It can also be seen in Table 4 that QBX is often faster than upsampled quadrature. For instance, QBX with p QBX = 40 takes about as much time as upsampled quadrature with κ Ui = 2 and is faster than any κ Ui ≥ 3. However, note that this conclusion may not hold when there are more than one evaluation point, since the evaluation time depends in an intricate way on both the number of evaluation points in each region and the number of expansions needed for QBX. In particular, if many expansions are needed (large N exp ) and there are few evaluation points per expansion, QBX will tend to be slower than upsampled quadrature due to the large cost of computing coe cients.

Summary of the parameter selection strategy
The parameter selection strategy can, in the general case, be summarized as follows. In all steps, the stresslet identity (5) is used to estimate the error.
Input: Discretization of the geometry, error tolerance ε tol , d QBX , r QBX , p QBX , κ QBX for the special quadrature For each distinct geometrical object: 1. Put κ Ui = i + 1. Numerically determine the threshold distances d Ui to keep the error below ε tol , as in Figure 15 and Table 1, up to the rst i such that d Ui ≤ 2γh, where γ = 0.85 and h is the grid spacing (de ned for particles in equation (54) and for walls in section 6.4).
2. Put r QBX = h. If the last (smallest) d Ui computed in step 1 lies in the interval [h, 2γh], put d QBX equal to it. Otherwise, put d QBX = h. This also sets N U , the number of upsampled quadrature regions.
3. Choose p QBX such that the truncation error is below ε tol , based on a parameter study such as in Figure 17 (b).
Choose κ QBX such that the coe cient error is no larger than the truncation error.
This strategy is designed to keep the error relative to max Γ |q| below ε tol when evaluating the layer potential, provided that the density q is well-resolved by the discretization. While there is no guarantee that the error stays strictly below ε tol , empirical evidence in sections 6.1.3, 6.3, 6.4, 7.1 and 7.2 indicates that the error is typically close to the tolerance, and in any case of the correct order of magnitude. Note that the procedure, including the parameter studies, must be repeated every time a new geometrical object, such as a rod particle with a di erent aspect ratio, is used. We will now apply the procedure to two additional examples.
6.3. Example II: a rod particle with higher aspect ratio For the second example, we consider a more slender rod particle, namely the rod of length L = 10 and radius R = 0.5 (aspect ratio 10) shown in Figure 22. The grid has parameters n 1 = 35, n 2 = 60 and n φ = 18, in total 2340 grid points. The grid spacing as de ned by (54) is h = π /18 ≈ 0.1745. It should be noted that while h is based only on the grid resolution in the azimuthal direction, the resolution in the polar direction must not be much coarser. Otherwise, the distance between QBX expansion centres would be too large in the polar direction, and the balls of convergence would not cover the QBX region. (This limitation is due to having one expansion centre per grid point.) Applying the same constant densityq = (1, 1, 1)/ √ 3, the direct quadrature error is shown in Figure 22  Error in a slice through the rod particle, when evaluating the stresslet identity (5) in free space using (a) direct quadrature, and (b) combined special quadrature with tolerance 10 −6 . In (b), the tolerance is exceeded in 148 points, marked red; the evaluation grid consists of 500 × 500 points and the largest error is 1.455 × 10 −6 . In (a), the 65 lines used to select parameters are shown in red.
in Figure 22 (a) used to select the threshold distances. The parameters for error tolerance 10 −6 are The error when using these parameters is shown in Figure 22 (b). As before, the tolerance is not strictly enforced, but the error stays within a factor 2 of the tolerance.
The parameters (59) for the slender rod can be compared with the parameters (58) for the less slender rod with the same tolerance; the threshold distances are relative to the diameter of the cylindrical part of the rod in both cases. Note that the slender rod (59) has larger threshold distances than the other rod (58). This re ects the fact that the error of the underlying direct quadrature at a xed distance from the rod is higher for the slender rod, since it has lower overall resolution (grid points per surface area). The slender rod also requires a higher p QBX since r QBX = h is larger for the slender rod.

Example III: a pair of plane walls
In this third example, we select parameters for a plane wall. Since we always consider walls in a periodic setting, we will do so here as well, and use the Spectral Ewald (SE) method described in section 5. We will here select the SE parameters such that the error from SE is completely negligible compared to the quadrature errors which we strive to control here. 11 Since the problem is periodic in all three spatial directions, we must have a pair of walls so that the uid domain can be con ned between them. The periodic cell is here of size B = (1, 1, 1) and the two walls are placed at a distance of 0.6 from each other. The walls are discretized using 11 × 11 patches each, with 8 × 8 grid points on each patch (as described in section 3.2), in total 7744 grid points per wall. For a wall, we de ne the grid spacing as h = max(h 1 , h 2 ), where h 1 and h 2 are the largest spacings between grid points in each of the two tensorial directions of the patches. The walls considered here have grid spacing h ≈ 0.01668. The constant densitỹ q = (0, 0, 1) is applied in the direction of the normal of the lower wall (pointing into the uid domain). The direct quadrature error is shown in Figure 23 (a).
We follow the procedure in section 6.2. The threshold distances of the upsampled quadrature regions are computed by evaluating the stresslet identity error along normal lines of the walls, with each line centred at a grid point. The error is plotted in Figure 23  To determine p QBX and κ QBX , we do a parameter study, shown in Figure 24 (a). Note that the plane wall needs a signi cantly lower p QBX than the rod particles to reach a given error. The error curves in Figure 24 (a) level out at around 10 −12 due to other errors not controlled by the QBX parameters. In order to reach the tolerance 10 −6 it is su cient to choose p QBX = 7 and κ QBX = 10. The selected parameters are thus The error when using these parameters is shown in Figure 24

Numerical results
Our numerical method can be summarized as follows: 1. The geometry is discretized as in section 3. Parameters for the combined special quadrature method are selected as in section 6.2. Parameter selection is done in free space for particles, but the same parameters can also be used in the periodic setting. For walls, parameter selection is done in the periodic setting.
2. The matrices M i and R i for o surface QBX and onsurface QBX, respectively, are precomputed as in section 4.3.
Interpolation matrices for the upsampled quadrature regions are also precomputed. At this point, the special quadrature is ready to be used to evaluate the layer potential.
3. The boundary integral equation for either a resistance problem or a mobility problem is solved iteratively for q using GMRES. The Spectral Ewald method is used for periodic problems, as described in section 5. A preconditioner is used in all cases, as described below.
• For a resistance problem, velocities are given for all particles and the boundary integral equation is given by equation (24).
• For a mobility problem, forces and torques are applied to all particles and the boundary integral equation is given by equation (25).
4. The ow eld in the uid domain may be computed in a postprocessing step using q from step 3. For a resistance problem, the forces and torques acting on all particles may also be computed here, while for a mobility problem, the particle velocities may be computed.
To improve the convergence of GMRES in step 3, we use a block-diagonal preconditioner similar to the one used in [25]. The preconditioner is constructed by computing the explicit inverse of a single-particle system as well as a system consisting of a single wall patch (if walls are present in the simulation). These two types of blocks are then placed along the diagonal and rotated according to the geometry. This preconditioner has been seen to reduce the number of GMRES iterations by as much as a factor 17 for some systems with many particles, such as the ones in section 7.3.
In this section, we test some aspects of our numerical method, with focus on the special quadrature. First, in section 7.1, we test the quadrature on its own (i.e. steps 1-2 above) with geometries containing both particles and walls. This serves as a continuation of the tests in section 6, where geometrical objects were considered only separately. In section 7.2, we test the special quadrature in the context of the full numerical method (steps 1-4 above) and in particular how the quadrature tolerance in uences the accuracy. Finally, in section 7.3, we test the computational complexity of the method on a more complicated problem, and compute streamlines.

Special quadrature with composite geometries
We consider two geometrical setups, shown in Figures 25 and 27. Both problems are periodic with a periodic cell of size B = (1, 1, 1), and the Spectral Ewald parameters are as in section 6.4. As in section 6, we use the stresslet identity (5) to estimate the error. This is the same test used to select the parameters, so it mainly serves as a consistency check (tests with nonconstant densities will follow in section 7.2).

Geometry 1: Two rods between a pair of plane walls
The rst geometry consists of two plane walls discretized as in section 6.4, at a distance 0.6 from each other. Between these walls are two rod particles of length L = 0.5 and radius R = L/20 (aspect ratio 10), discretized as in section 6.3 (but scaled down a factor 20), oriented such that their axes lie in the centre plane.
The stresslet identity error is shown for two di erent quadrature tolerances ε tol in Figure 25. In (a), ε tol = 10 −6 , the quadrature parameters for the walls are as in section 6.4, i.e. given by (60); for the rods, the parameters are as in section 6.3 but with all distances scaled by 1/20 to account for the di erence in size. Thus, the parameters for  Figure 26. This shows that the error more or less follows the tolerance, as expected. 25. Stresslet identity error in the centre plane, for geometry 1, in (a) for tolerance 10 −6 and in (b) for tolerance 10 −8 . The largest error is 1.205 × 10 −6 in (a) and 9.389 × 10 −9 in (b). In (a), the error exceeds the tolerance in 2 points (the evaluation grid has 500 × 500 points), marked red. Maximal and root-mean-square (RMS) stresslet identity error in the centre plane as a function of the special quadrature tolerance, for geometry 1. As observed already in section 6, the tolerance is sometimes exceeded slightly at the threshold distances, which causes the max error curve to lie above the identity line Error = Tolerance (dashed).

Geometry 2: Two spheroids in a pipe
The second geometry consists of a pipe of radius 0.3, discretized using 5 × 10 patches with 6 × 6 grid points each. Inside the pipe are two spheroids with semiaxes a = 0.05 and c = 0.1, discretized with parameters n θ = 36 and n φ = 25 (900 grid points per spheroid).
We select the error tolerance ε tol = 10 −6 . The parameters selected according to section 6.2 are, for the pipe d QBX = 0.0614, p QBX = 12, κ QBX = 10 and N U = 2, d U1 = 0.222, d U2 = 0.0888; and for the spheroids d QBX = 0.0153, p QBX = 27, κ QBX = 15 and N U = 2, d U1 = 0.0568, d U2 = 0.0235. The error when using these parameters is shown in Figure 27 (b), together with the direct quadrature error in Figure 27 (a). Note that we have selected a much lower resolution for the pipe in comparison to the walls in geometry 1 (section 7.1.1), which is re ected in the larger threshold distances compared to (60).  27. Stresslet identity error in the centre plane, for geometry 2, in (a) using direct quadrature and in (b) using combined special quadrature with tolerance 10 −6 . In (b), the tolerance is exceeded in 5 points (the evaluation grid has 500 × 500 points), marked red; the largest error in the slice is 1.323 × 10 −6 .

Solving the boundary integral equation
Here, we investigate how the special quadrature tolerance in uences the accuracy of the numerical method, i.e. when solving the boundary integral equation. We will use the mobility problem as our model problem, and apply the force F = (0, 0, −1) to all particles, with zero torque and no background ow. In order to get the expected accuracy when solving the boundary integral equation, the double layer density must be well-resolved by the geometry discretization. It turns out that for elongated particles, the density and how easy it is to resolve depends heavily on the number of completion sources N src (de ned in section 2.1). Therefore, we begin in section 7.2.1 by investigating how large N src must be to ensure that the density is well-resolved for a given discretization. Then, in section 7.2.2, we study how the special quadrature tolerance in uences the accuracy of the method.

Selecting the number of completion sources
To study the in uence of N src , we consider a single rod particle with length L = 0.5 and radius R = L/20 (aspect ratio 10) in free space, shown in Figure 29 (a) together with the ow eld resulting from the force F = (0, 0, −1). We now solve this mobility problem for varying N src , with the special quadrature error tolerance xed to 10 −9 here. 12 The completion ow V (α ) , which appears in the right-hand side of the boundary integral equation (20), will change drastically as N src grows from small values, as shown in Figure 28; the completion ow becomes increasingly smoother as N src increases. Naturally, this means that the density q itself will change as N src grows. However, the real physical quantities -the particle velocity and the ow eld -should not change since the net force and torque on the particle does not change. Thus, these physical quantities can be used to gauge how N src a ects the accuracy of the solution. As Figure 29 (b) shows, the e ect is quite large, and most pronounced in the uid ow velocity (the blue curve). (For this problem, the magnitude of the uid ow velocity and particle velocity U RBM are both around 0.6, while the angular velocity Ω RBM is zero.) Thus, it is important to select N src high enough for the error in Figure 29 (b) to satisfy the error tolerance. The e ect of N src on the accuracy is stronger the more elongated the particle is; for particles with low aspect ratio, N src = 1 may be su cient. The e ect is very similar for the resistance problem, to the degree that the max ow error in Figure 29 (b) can be used to determine N src for both the mobility problem and resistance problem. Note that N src does not a ect the size of the linear system, i.e. (24) or (25). Since the background ow is zero, this is exactly the right-hand side of the boundary integral equation (20). Note that the colour scale is di erent for N src = 1 compared to the other values.  29. (a) Flow eld resulting from the mobility problem for a single rod in free space (colour indicates velocity magnitude, small black arrows indicate velocity direction). The large red arrow indicates the applied force, and the large black arrow indicates the velocity of the rod (not to scale with the small arrows). (b) Contribution to the absolute error from the way the completion sources are distributed, as a function of N src (for a rod particle of aspect ratio 10). The error is estimated as the di erence to a reference solution with N src = 135. Note that the max ow error attens out around 10 −9 , the special quadrature error tolerance.

E ect of the special quadrature on the accuracy
We continue to study the mobility problem, but now add another rod particle and a pair of plane walls, as shown in Figure 30. We x N src = 65, which was enough to get the error below 10 −9 in the previous problem. The walls are discretized using 22 × 22 patches with 8 × 8 grid points each (30 976 grid points per wall), and the rod particles are discretized as in section 7.1. We set the special quadrature tolerance to di erent values between 10 −1 and 10 −8 , solve the mobility problem, and compute the ow eld and particle velocities. The errors in the ow eld, density and particle velocities are estimated using a reference solution with special quadrature tolerance 10 −9 ; these are shown in Figures 30 (b) and 31 (a)-(b). Note that the tolerance sets the ow eld error relative to max |q| quite accurately; the density and particle velocity errors are even smaller. We would like to point out that the value of the scale factor max |q| is not known a priori, but it is of course known after having solved the boundary integral equation.
It should be noted that the error cannot be expected to follow the tolerance unless the density is well-resolved by the geometry discretization, since otherwise the interpolated density will be inaccurate. It has been observed that the density becomes hard to resolve, with either sharp peaks or high-frequency oscillations, when particles come very close to each other or the walls (where "very close" is measured relative to the grid resolution). Thus, one may be forced to increase the grid resolution in these cases.
For elongated particles, the set of matrices M i , i = 1, . . . , n θ /2, which are precomputed for o surface QBX tends to become quite large for strict quadrature tolerances. The reason for this is that with particle-global QBX, the whole set of matrices consists of 3n φ n 2 θ (p QBX + 1)(p QBX + 2) complex numbers, i.e. it is quadratic in both p QBX and n θ (where n θ is the number of grid points in the axial direction,   Figure 30 (a)), estimated using a reference solution with tolerance 10 −9 . (b) Absolute error of the particle velocities.
which we de ne as 2n 1 + n 2 for rod particles). For elongated particles, n θ tends to be large; for example, for the rods considered in this section (aspect ratio 10), n θ = 130 and n φ = 18. The set of matrices for tolerance 10 −8 (p QBX = 40) then takes up around 25 gigabytes when stored in double precision, while for tolerance 10 −6 (p QBX = 28) the matrices take up around 13 gigabytes. To reduce the size of the matrices in this situation, a local patch-based discretization could be used also for the particles, in the same way it is already used for the walls. This would reduce the number of grid points included in the special quadrature and thus the size of the matrices. The computational cost of our special quadrature method is quadratic in the number of grid points per particle (or patch), but linear in the number of particles (patches) if their discretization is kept xed. For the Spectral Ewald method, the computational cost scales like O (N log N ), where N is the number of unknowns in the system (i.e. three times the total number of grid points), assuming that the number of grid points within a ball of radius r c does not change. In other words, for xed grid resolution and particle concentration, the time required per GMRES iteration when solving the boundary integral equation is expected to scale like O (N log N ).
To test this scaling, we consider a problem with many rod particles con ned in a pipe, shown in Figure 32: one segment (a) consists of a pipe segment of radius 0.3 and length 0.2 con ned in a periodic cell of size B = (0.2, 1, 1), with 20 rods of length L = 0.25 and radius R = L/12 (aspect ratio 6) inside the pipe. 13 This segment is replicated to create a longer pipe, up to 12 times the original length (shown in (b)), with the same grid point concentration as the original segment. For 1, 2, 3, . . . , 12 segments, we solve a resistance problem in which all particles are stationary and a quadratic background ow is applied, where A = 0.3 is the radius of the pipe and (x 2 , x 3 ) = (0, 0) is its centre line. As seen in Figure 33 (a), the time per GMRES iteration follows the expected scaling O (N log N ). Since the structure of the linear system changes as the number of segments n s grows, the number of GMRES iterations grows slightly with n s . However, this growth is slow enough for the total solving time to also follow the scaling O(N log N ), as shown in Figure 33

Streamline computation
In the postprocessing step (step 4 of the method summary), streamlines may be computed to visualize the ow eld. When using the Spectral Ewald method, the Fourier-space part on the uniform grid can be reused to reduce the computation time, as described in appendix D. Here, we compute streamlines for a periodic resistance problem with 100 rods in a pipe segment of length 1, otherwise identical to the problem in section 7.3.1 (including all parameters). Figure 34 shows 95 streamlines; a typical streamline consists of around 3000 points and takes around 2 minutes to compute (i.e. around 0.04 seconds per time step). A slice of the same ow eld is shown in Figure 35.

E ects of nonsmooth geometries
So far, all geometrical objects considered in this paper have been smooth. In fact, special care has been taken to ensure that the rod particles, constructed in appendix B, are everywhere smooth. The reason is that, as noted in [14], the convergence of the local expansions used in QBX depends on the smoothness of the boundary close to the expansion centre. In this section, we demonstrate this using two di erent rod particles: one smooth and one nonsmooth, shown in Figure 36 (a). The rods are both of length L and radius R, but the smooth rod is constructed as in appendix B, while the nonsmooth rod consists of a cylinder of radius R and length L − 2R joined to two halfspherical caps of radius R. The nonsmooth rod is thus of class C 1 , since the curvature is discontinuous where the cylinder meets the spherical caps.
To illustrate the convergence of QBX, consider rods with L/R = 20 (aspect ratio 10), discretized using n 1 = 35, n 2 = 60, n φ = 18 as described in section 3.1. In Figure 36 (b), the onsurface QBX stresslet identity error is plotted as a function of p QBX , in the same way as in section 6.3 (where this was done for the smooth rod). Clearly, the convergence with respect to p QBX is much worse for the nonsmooth rod compared to the smooth one. The reason for this can be seen in Figure 37: the error decays extremely slowly close to the boundary between the cylinder and the caps, where the curvature is discontinuous. This is clearly a local e ect, since the convergence is ne a little bit away from the discontinuity. It should be noted that it is entirely possible to use QBX on a nonsmooth geometry, but it requires special measures to be taken. In [29], QBX was applied to a geometry with a corner. In that example, the discretization was dyadically re ned around the corner, to ensure that the layer potential appears locally smooth on the scale of the discretization. The same approach could likely be taken also for the nonsmooth rod particle, i.e. re ning the grid dyadically around the discontinuity. However, constructing the rod to be smooth in the rst place has a clear advantage in this case, since no grid re nement is needed.

Conclusions
We have presented a numerical method based on a boundary integral formulation that can be used to simulate rigid particles in Stokes ow with con ning walls. A parameter selection strategy has also been presented for the combined special quadrature used in this method. We have demonstrated that the error of the method is controlled by the special quadrature tolerance as long as the layer density is well-resolved, and that the method scales as O(N log N ) in the number of unknowns N for xed grid point concentration. This makes it possible to simulate systems with a large number of particles. The method can deal with particles and walls of di erent shapes; we have here considered spheroids, rod particles, pipes and plane walls, but it is straightforward to extend the method to any smooth geometry with su cient symmetry.
The method could be further improved for example by using local patch-based quadrature for elongated particles to reduce the size of the QBX matrices, and allowing the size of the wall patches to be set adaptively so that the resolution can be focused where particles are close to the wall. It could also be useful to allow parameters such as p QBX to vary along the particle surface (in response to di erences in the convergence rate of the local expansions, as seen e.g. in Figure 37), and to allow the expansion centres for QBX to be placed independently of the grid points of the discretization, so that the centres can be placed closer to the surface in order to decrease the expansion order p QBX . Furthermore, if analytical quadrature error estimates were available, these could replace the numerical experiments used to select threshold distances and the QBX upsampling factor.

A.2. A pipe
Let nowΓ be an in nitely long pipe given by the equation x 2 2 + x 2 3 = a 2 for some a > 0, as shown in Figure 39. Let the domain inside the pipe be denoted by Ω. We shall show that for any constant vectorq ∈ R 3 , the identity (63) holds. Let us introduce cylindrical coordinates and write x = x 1 e 1 + re φ and = 1 e 1 + ae θ for the evaluation point and integration variable, respectively. The unit vectors are given by e 1 = (1, 0, 0), e φ = (0, cos φ, sin φ) and e θ = (0, cos θ, sin θ ), and r ≥ 0. Using the fact that the normal vector is given by n( ) = −e θ , we can write the double layer potential from (3) as where (e θ ) k denotes the kth component of e θ . Using the variable substitution 1 − x 1 = u, we can eliminate x 1 . Writing out the stresslet T from (3), and using a few trigonometric identities, the integral in (72) can be written as At this point it is not immediately apparent that the integral which we have called I i j (r , φ) is independent of φ, but that does indeed turn out to be the case. We expect the o diagonal elements of I i j to be zero, which can be veri ed by rst integrating in u and then in θ . It thus remains to compute the diagonal elements of I i j .
To compute I 11 , rst integrate in u using the formula θ ∈ I 3 = [2π /3, π ]. Let the length of each cap be L cap , as shown in Figure 40. The ratio L cap /R determines the aspect ratio of the cap. Here, we x this ratio by setting L cap to which gives the cap a shape similar to a half-sphere. The length of the middle cylinder is then However, note that the derivation below is valid for any value of L cap ∈ (0, L/2), with L mid = L − 2L cap . Let us for xed L and R de ne (θ ) = ( 1 (θ ), 2 (θ )) = (ϱ(θ ; L, R), β(θ ; L, R)). For the middle cylinder, the parametrization is 1 (θ ) = R, 2 (θ ) = 1 − 3 π θ L mid + L mid 2 , θ ∈ I 2 = [π /3, 2π /3].
Note that 2 is simply an a ne function of θ . At the endpoints of the interval I 2 we have Our goal is now to extend the parametrization (θ ) to I 1 and I 3 in a way such that the unit tangent vector (θ )/| (θ )| and its higher derivatives are continuous everywhere. As an intermediate step we introduce an auxiliary function (t) = ( 1 (t), 2 (t)) with a di erent parameter t ∈ [−1, 1]. The function should trace the curve from C to B via A in Figure 40, with C corresponding to t = −1, A corresponding to t = 0 and B corresponding to t = 1. We will later relate t ∈ [0, 1] to θ ∈ [0, π /3] to get the nal parametrization. At this point, note that to match (88) we must require where b is some positive constant. In order to construct (t) we will use a bump function ψ : R → R, which must satisfy the following: • ψ must be in nitely di erentiable on R, • ψ must have compact support in [−1, 1], i.e. ψ (t) = 0 if t > 1 or t < −1, • ψ (t) must be positive for t ∈ (−1, 1), • ψ must be even, i.e. ψ (t) = ψ (−t) for all t ∈ R.
We also introduce its primitive function which is an odd function since ψ is even. We choose a speci c bump function, namely 14 This function has the primitive function We now construct (t) as which satis es (89). We can determine b by noting that we must have 2 (0) = L/2 (at point A in Figure 40), which yields b = L cap Ψ(1) The integrals of Ψ in (93) and (94) are computed numerically using MATLAB's integral function. Finally, we go from the parameter t to the parameter θ . We would like the discretization points to be distributed as Gauss-Legendre points in the arclength, and so we must choose θ so that it is proportional to the arclength on the caps. Consider the arclength Let us then de ne and note that this de nes θ ∈ I 1 = [0, π /3] as an invertible function of t ∈ [0, 1]. We can now de ne (θ ) = (G(t)) = (t) for t ∈ [0, 1], and thus (θ ) = (G −1 (θ )), The bottom cap should be the re ection of the top cap in the plane corresponding to β = 0, so Now that we have de ned (θ ) for all θ ∈ [0, π ], its two components 1 and 2 correspond to the shape factors ϱ(θ ; L, R) and β(θ ; L, R), respectively, which are to be used in (84).

Appendix C. Derivation of the safety factor γ
Recall from section 6 that one may want to select d QBX larger than r QBX since QBX may be faster than the upsampled quadrature due to the precomputation scheme. Let us call the set of points of the QBX region with distance to Γ greater than r QBX the upper QBX region, and the set of points with distance to Γ smaller than r QBX the lower QBX region, as shown in Figure 41 (a). As noted in section 6.1.2, putting d QBX = 2r QBX would lead to some areas of the upper QBX region not falling within any ball of convergence. To avoid this, we introduce a safety factor γ and require that  Figure 18), seen from the side. (b) Grid points of Γ, seen from above. Here, h is the largest spacing between grid points in each tensorial direction.
The goal here is to derive the value of the safety factor γ . We assume for simplicity that Γ is a at surface.
The key is to choose d QBX below the intersection of neighbouring balls of convergence, marked by the point C in Figure 41 (a). Since the grid on Γ is two-dimensional, the largest distance between neighbouring grid points is not h but √ 2h, where h is as shown in Figure 41 (b). The four balls of convergence of the expansion centres above the grid points D-G in this gure intersect at distance It is thus su cient to require that d QBX ≤ d . Comparing (101) and (99), we see that the safety factor should be This derivation holds when Γ is a at surface, in which case the requirement (99) with γ = 0.85 guarantees that all points in the upper QBX region fall within a ball of convergence, as long as h ≤ r QBX . If Γ is curved, this guarantee holds on the concave side of Γ, but not necessarily on the convex side, where d QBX may have to be even smaller for the guarantee to hold. Nonetheless, we use (99) with γ = 0.85 also for convex surfaces such as rods and spheroids, and it seems to work well in practice. Of course, the parameter selection strategy (section 6.2, step 2) will in most cases choose d QBX less than the upper bound 2γr QBX .

Appendix D. E cient computation of streamlines in periodic ow
To compute streamlines in a periodic problem such as in section 7.3, we must rst solve the periodic boundary integral equation as described in section 5 to get the density q on Γ. We can then compute the ow eld u(x e ) = u bg (x e ) + D 3P [Γ, q](x e ) + Of course, (104) is discretized using some timestepping method, which must evaluate (103) at every timestep. Recall that the periodic double layer potential D 3P is split into two parts and similarly for V (α ),3P . The rst part D 3P,R decays fast and is treated according to section 5. The second part D 3P,F decays slowly in real space, but since it is smooth its Fourier coe cients decay fast. In the Spectral Ewald method, D 3P,F as given by (49) is rst discretized using the direct quadrature rule (22) to give This is a periodic sum of point sources with strengths Z jl (x s ) = q j (x s )n l (x s )w s . The Spectral Ewald method [23,31,32] computes the periodic sum (106) in ve steps: 1. Spreading point sources to a grid: A three-dimensional uniform grid is constructed over the primary cell. A window function W (r ) is convolved with the point sources in the primary cell to give Here, [·] * denotes that the shortest periodic distance should be used, i.e.
[r ] * = r + B · arg min is the size of the periodic cell. In this work the window function is a truncated Gaussian, given by W (r ) = w(r 1 )w(r 2 )w(r 3 ), where w(r ) = e −A(r /r trunc ) 2 , if |r | ≤ r trunc = h g P/2, 0, otherwise.
Here, h g is the grid spacing of the uniform grid, P is the number of grid points within the support of w, and A = 0.9 2 πP/2. The parameter P is chosen as discussed in [23]. It is also possible to use other window functions than the Gaussian, as discussed for example in [48].
The function H jl (x) as given by (107) is evaluated on the uniform grid.

FFT:
The three-dimensional Fourier transform H jl (k) is computed using the FFT. This is possible since H jl (x) is de ned on a uniform grid.
3. Scaling: The result is multiplied by the Fourier transform of T F , and divided by the Fourier transform of the window function W to undo the convolution in step 1. Since we will convolve again in step 5, this division is done twice. Thus, we here compute where as given in [23].
where B denotes the primary cell. The integral in (112) is evaluated using the trapezoidal rule on the uniform grid, which is spectrally accurate since the integrand is periodic.
Since the density q does not change during the computation of the streamlines, and the evaluation point x e enters only in step 5 above, it is possible to do step 1-4 once before starting to compute the streamlines, and save H i (x) on the uniform grid from step 4. When the Fourier-space part D 3P,F [Γ, q](x e ; ξ ) is to be evaluated at x e (t) at every timestep of solving (104), it is then enough to do only step 5. This speeds up the computation of the streamlines since evaluating (112) is fast for a single evaluation point. The real-space part D 3P,R [Γ, q](x e ; ξ ) must be computed from scratch at every timestep, but this is fast since it is a local sum due to its rapid decay. The periodic completion ow V (α ),3P which appears in (103) is treated in a very similar way; for details, we refer to [31,23,24]. Note that steps 1-5 of the Spectral Ewald method are also what is used when solving the periodic boundary integral equation as described in section 5, but in that situation all the evaluation points (i.e. the grid points of Γ) are known in advance so they can all be fed into step 5 at the same time.