Efficient viscosity contrast calculation for blood flow simulations using the lattice Boltzmann method

The lattice Boltzmann method (LBM) combined with the immersed boundary method is a common tool to simulate the movement of red blood cel ls (RBCs) through blood vessels. With very few exceptions, such simulations neglect the difference in viscosities between the hemoglobin solution inside the cells and the blood plasma outside, although it is well known that this viscosity contrast can severely affect cell deformation. While it is easy to change the local viscosity in LBM, the challenge is to distinguish whether a given lattice point is inside or outside the RBC at each time step. Here, we present a fast algorithm to solve this issue by tracking the membrane motion and computing the scalar product between the local surface normal and the distance vector between the closest LBM lattice point and the surface. This approach is much faster than, for example, the ray‐casting method. With the domain tracking applied, we investigate the shape transition of a RBC in a microchannel for different viscosity contrast and validate our method by comparing with boundary‐integral simulations.

A RBC consists of a thin elastic membrane surrounding the interior hemoglobin solution which to a good approximation can be viewed as a Newtonian liquid with a viscosity about five times larger than the surrounding blood plasma. 10,18 This viscosity contrast is essential for the RBC dynamics. [19][20][21][22][23][24][25][26] Depending on the numerical technique, it can be more or less tedious to include the parameter into numerical simulations. In boundary-integral methods (BIM), the consideration of a viscosity contrast is conceptually straightforward, although computationally costly. [27][28][29][30] Particle methods such as smoothed dissipative particle dynamics (SDPD) are able to include viscosity contrast, 7 although this is not always done. 5,31 In finite volume methods, a viscosity contrast has recently been included using an indicator function advected with the fluid. 32 One of the most widely used techniques in blood flow simulations is the combination of the Eulerian lattice Boltzmann method (LBM) 33 for the flow and the Lagrangian immersed boundary method (IBM) [34][35][36][37] for the fluid-structure interaction with the RBC membrane. Although it is simple to locally change the fluid viscosity in LBM, the key difficulty is to determine whether or not a given LBM lattice point is located inside or outside a RBC in order to assign the correct viscosity. A standard solution to such inside/outside problems consists of tracing a beam originating from the point of interest and counting the number of RBC membrane crossings (ray-casting). Carrying out this determination for every time step during a simulation, where cells move and deform dynamically, would clearly make the ray-casting approach far too computationally expensive (except in 2D 38 ). In 3D, cluster algorithms 39 and volume-of-fluid like methods 40 can be expected to be relatively costly. Some recent works 39,41,42 therefore raised the general idea of tracking membrane vertices in order to determine the interior volume of a moving cell. Nevertheless, the vast majority of LBM-IBM simulations of RBC flow still neglect viscosity contrast altogether.
Here we provide a detailed description and systematic analysis of such a tracking algorithm to include viscosity contrast of red blood (and other) cells into LBM-IBM simulations. Given an initial configuration in which the inside/outside status of each LBM lattice point is known, the first step of the algorithm identifies those LB points which could potentially, due to motion/deformation of the RBC membrane, switch from inside to outside or vice versa. This task is greatly simplified in the present case since for LBM-IBM methods the typical distance between membrane vertices is kept similar to the LBM grid distance even during large deformations of the cell, as we verified by postprocessing a large set of our existing simulation data. It thus proves sufficient to consider only those LBM points which are in the immediate vicinity of membrane vertices. In the second step, geometrical considerations allow us to determine for each LBM lattice point whether or not the inside/outside flag needs to be switched for the next time step. Every few hundred time steps an efficient heuristic correction step is carried out to remove spurious errors. We thus obtain a highly accurate and highly parallelizable tool for RBC simulations with viscosity contrast using the LBM-IBM approach. We validate our tool by studying the croissant-slipper transition for a RBC in a rectangular microchannel as a function of for which we find very good agreement with highly resolved BIM simulations. 11

Lattice Boltzmann for the fluid
The LBM 33,43 is a powerful mesoscopic fluid solver. The fluid is represented by a discrete set of particle populations f i along fixed directions ⃗ c i located on a Cartesian lattice. For concurrency reasons, two copies f c i and f s i need to be stored in memory for the collision and streaming steps. The so-called equilibrium populations f eq i are only temporarily held in register. and ⃗ u denote the density and velocity of the fluid for every lattice point and c s = 1 √ 3

Δx
Δt is the lattice speed of sound with the grid distance being denoted as Δx and the time step as Δt . In each time step, the populations stream into neighboring lattice points where they collide and are redistributed into the streaming directions for the next step. In a nutshell, LBM can be written down as just five equations: 2. Collision (MRT operator) (2) . (3) F I G U R E 1 Two-dimensional illustration of the immersed boundary method. Membrane vertices can be located anywhere in between lattice points [Colour figure can be viewed at wileyonlinelibrary.com] In Equation (4) M is a transformation matrix into moment space and S is a diagonal matrix containing all relaxation times. The kinematic shear viscosity of the simulated fluid is The relaxation time can be different at every lattice point, which makes it possible to change locally in space. The tracking algorithm presented below uses this possibility to implement a viscosity contrast between the interior and the exterior of a flowing cell according to a flag lattice.
Our simulations are based on the LBM implementation of the simulation package ESPResSo 44-46 which uses a multi-relaxation-time collision operator and halfway bounce-back conditions for the solid boundaries. With an additional volume force term in the collision operator following the Guo scheme, 47 a persisting flow is created.

IBM for cell membranes
The IBM 34,48,49 enables the Lagrangian movement of a tesselated membrane along with the LBM velocity field and couples back membrane forces into the fluid. The membrane vertices can move freely between lattice points ( Figure 1) and their movement is coupled to the lattice in two ways: To obtain the velocity for advecting a membrane vertex and the velocity of the surrounding lattice points is interpolated. Then the elastic forces between membrane vertices are calculated (see below) and the force for each vertex is spread across all nearby lattice points as an additional local volume force term in LBM. A necessary requirement for this two-way coupling is that the distance between membrane vertices is at the same scale as the distance between neighboring lattice points in order to prevent "holes" in the membrane for too large and bad velocity interpolation for too small vertex spacing.

RBC model
To illustrate an application of our tracking algorithm, we will present below investigations of the behavior of a RBC with different viscosity contrasts flowing in a microchannel. Figure 2 visualizes how we generate the RBC surface by recursively splitting the faces of an icosahedron, following Loop's subdivision surface scheme. 50 The triangle vertices are ordered such that all surface normals point outwards, which is a requirement of our tracking algorithm. For the RBC membrane mechanics, we employ the standard model described in more detail in References 11 and 18 and many other works. Briefly, elastic forces arising from membrane deformation are due to shear elasticity, area dilatation and bending forces. The former two are modeled via the empirical Skalak law 51 while bending forces are computed from the Helfrich model using the method denoted "B" in References 52 and 53, originally developed by Gompper and Kroll. 54

Strain energy
For a small element of a 2D membrane with the dimensions dx 1 and dx 2 along the x 1 and x 2 axes, the expansion ratios 1 ∶= F I G U R E 2 Generation of the tesselated red blood cell surface visualized. The shape starts as an icosahedron in (A), then in (B) to (E) the triangles are split recursively to increase resolution and finally the top and bottom caps of the sphere are dented inwards to produce the characteristic RBC shape in (F). The sphere in (E) and the RBC in (F) have 5120 triangles each [Colour figure can be viewed at wileyonlinelibrary.com] are defined. The strain invariants I 1 and I 2 then are and the strain energy E S following the Skalak model 51 only depends on these invariants: 18,55 The constants B and C are material properties of the membrane.

Bending forces
The idea is to calculate the bending force ⃗ F for each membrane vertex (i) at position ⃗ from the Helfrich bending energy E B 53,56 whereby B denotes the bending modulus, S denotes the instantaneous smooth surface, and N denotes the number of membrane vertices. The local mean curvature H of the RBC surface is calculated as with the approximation for the the Laplace-Beltrami operator Δ S of Gompper and Kroll: 52,54 where the superscript (j) denotes the index of membrane vertices adjacent to the membrane vertex (i) and A (i) Voronoi is the area of the Voronoi cell containing (i) as illustrated in figure 3: F I G U R E 3 A sketch of a membrane vertex ⃗ x (i) with six neighbors to illustrate the naming conventions [Colour figure can be viewed at wileyonlinelibrary.com]

F I G U R E 4 A cut-open red blood cells on the Boltzmann
method lattice with all points on the inside marked blue

ALGORITHM FOR TRACKING INSIDE AND OUTSIDE
The goal of the following algorithm is to calculate a flag lattice with the same dimensions as the LBM lattice which for each lattice point contains the boolean information whether the lattice point currently is inside or outside a cell. Our method consists of three steps: (i) an initialization step where the flag lattice is filled once depending on a prescribed initial geometry, (ii) a highly efficient update step where only the LBM points neighboring the cell membrane are evaluated, and (iii) a correction step employed every few hundred steps which removes spurious artifacts introduced by the update step using a simple set of heuristic rules. Figure 4 gives an impression on how large the LBM domain for a RBC typically is.

Initialization step
If the geometry of the cell is known, the analytic condition for the surface can be applied to every lattice point. For example, a point with the coordinates x, y, z is within a sphere of radius r if x 2 + y 2 + z 2 ≤ r 2 . For a RBC with larger radius R, a similar condition 57 has been formulated based on experimental data: This approach exhibits the same efficient scaling of the total compute time t with the LBM lattice size N (t ∼ N 3 ) as the LBM itself.
It is also possible to load an initially deformed state of the RBC (no analytic condition) from a previously generated checkpoint file. These checkpoint files are periodically generated during simulation and stored on the hard drive in case there is a crash or power outage. If no analytic condition for the initial geometry is known, we use a ray-cast-based algorithm (also known as crossing number algorithm 58 ). As this method is only used once for initialization, its slow F I G U R E 5 In the update step, only the lattice points next to the surface are considered (radius of lattice points around each membrane vertex r shell = 1 ). These points are located both inside and outside of the cell surface. The image shows a cell (cut in half for visualization) with the nearest lattice points to the surface marked as blue dots. Only for this shell of lattice points the decision needs to be made. The state of all other points which are far away from the cell surface has already been determined by either the initialization or the previous update steps performance does not affect the overall speed of the simulation. For every lattice point, a ray from the lattice point in an arbitrary direction * is generated. Then the number of ray-triangle intersections is calculated using the Möller-Trumbore intersection algorithm. 59 If this number is odd, the lattice point is inside the cell, otherwise it is outside. This approach is very calculation intensive: it scales with the number of LBM lattice points (N 3 ) times the number of triangles T. The number of triangles is directly proportional to the total surface area of the RBC. Due to the requirement that the distance between two neighboring points on the RBC surface should be approximately the same as the distance between two neighboring points on the LBM lattice, we have T ∼ N 2 . The total scaling for the ray-casting approach is thus t ∼ N 3 T ∼ N 5 .
AC++ implementation of the initialization with ray-casting is listed in Appendix A.

Update step
Because of their bad performance and bad parallelizability, ray-casting algorithms are only suitable for initialization and not for the regular update happening at each time step. For the update step to be more efficient, we therefore implement a membrane tracking procedure as follows. In each time step, we only consider LBM lattice points next to the cell surface ( Figure 5). For these points, we calculate the distance vector to the RBC surface and compare it to the normal vector on the surface via a scalar product, resulting in a negative or positive value discrimnating inside from outside. Four arrays need to be allocated once at simulation startup. These store the normal vector for each membrane vertex (A), a list of all lattice points which are close to the surface and whose inside/outside flag may need to be updated (update list, (B)) as well as the index and distance of the closest membrane vertex ((C) and (D), respectively) for each lattice point. The algorithm consists of three consecutive loops: 1. In the first loop, we compute for each membrane vertex the normal vector as the average of the normal vectors of all adjoining triangles weighted by their area and store it in array (A). 2. The purpose of the second loop is to fill the update list (B), that is, to find a shell of lattice points located around the membrane. Therefore, for each of the membrane vertices the eight closest lattice points (r shell = 1 ) are determined via integer casting. The distance from each of these to the membrane vertex is calculated. If this distance is smaller than the distance stored in the array (D) entry for the lattice point, the distance in array (D) is updated, the membrane vertex index is stored in array (C) and the 3D position indices i, j, and k of the lattice point are stored in a the update list (B). (B) may contain some lattice points more than once, which is not a problem however, because the last entry for a given point will always be the one of the closest membrane vertex. 3. The third loop goes through the update list (B). The indices of the lattice points and their closest membrane vertices are fetched from (B) and (C). Then, the vector from the membrane vertex to the lattice point and the previously calculated *Mathematically, the direction of the ray is arbitrary. However, if the ray passes exactly through the edge of two adjacent triangles, the intersection count is increased by two instead of one, which leads to the algorithm failing. The solution to the problem is statistical exclusion by pointing the ray in a direction that cannot be represented by floating point numbers. For example, a ray in the direction (1.040.030.01) T will never intersect the edge of a triangle, while a ray in the direction (100) T probably will. Alternatively, multiple rays in different directions can be computed.

F I G U R E 6
The special case where our algorithm fails visualized in two dimension. A point is located below the plane perpendicular to the closest vertex normal (yellow line, > 90 • ) and therefore activated even though it is actually located outside of the cell volume. This can only occur if the angle between triangles is small, otherwise the correct vertex (green line with the angle ) would be closer normal vector of the membrane vertex from (A) are compared via scalar product. A negative result defines the lattice point as inside the cell.
The main advantage of our proposed method is its computational speed and parallelizability. Our method is solely based on the knowledge of the surface geometry and also works when only parts of the surface are known in different simulation domains, which makes it ideal for multi-CPU parallelization. The update step scales with t ∼ (2r shell ) 3 T ∼ N 2 , which is considerably faster than LBM and thus does not impose any notable performance penalty on the simulations. In Appendix B a C++ implementation as used in ESPResSo is listed. The only prerequisite is that the maximum distance between membrane vertices (using a r shell = 1) should be smaller than two times the lattice constant in order to prevent holes. As pointed out above, this condition is automatically ensured for almost all membrane vertices by the LBM-IBM algorithm. Furthermore, the vertex indices of all triangles must be ordered such that all surface normals point outwards.
In the rare case of an extremely crumpled surface, when the angle between neighboring triangles is smaller than 90 • , our algorithm can sometimes fail (see Figure 6 for details). Moreover, points outside of the cell which have falsely been "activated" (believed to be inside) due to this error may stay activated if the cell has moved away and the point is out of reach of the algorithm. This can result in the cell dragging a tail of activated points behind. To remove these spurious artifacts, an additional correction step every few hundred regular steps is required as described in Section 3.3 below.
Variants of the algorithm with a wider radius r shell > 1 of lattice points around the cell membrane have also been tested, for example using the closest 4 3 points instead the closest 2 3 points to a membrane vertex. A wider radius of lattice points vastly slows down the algorithm, as more points need to be processed. In addition, the lattice points are then further away from the surface, increasing the risk of failure due to the surface curvature. A wider radius is only useful if the cells membrane is triangulated sparsely compared to the lattice point density, in which case r shel l = 1 would result in an excessive amount of holes. Given a sufficient membrane vertex density though, one can avoid the large computational overhead of a wider radius.
There is another similar variant, update via face normals, which in the second loop instead of membrane vertices uses the centers of the triangles with their direct normal vectors. This variant however is much more prone to errors when the surface is crumpled, since there the normal vectors are used directly and are not averaged over five to six triangles. Furthermore, although the scaling of t ∼ N 2 is the same, it is only about half as fast compared to the update via vertex normals described above, because there are only about half as many membrane vertices as triangles.

Correction step
Since the update step from Section 3.2 occasionally creates artifacts as described in Figure 6, an additional correction step is required to run once every few hundred regular steps. The correction algorithm loops through all lattice points of the inside/outside flag lattice and for each point counts the number of "activated" (believed to be inside) neighbors. Any given point can have a maximum of 26 activated neighbors. Our correction algorithm (i) detects activated lattice points with too few active neighbors and deactivates them, as well as (ii) detects deactivated points with too many active neighbors and activates them. The reason for this is the assumption that the surface -or the boundary between activated and deactivated lattice points-is locally smooth, so an activated lattice point on the boundary ideally has not much less than 13 activated neighbors and a deactivated point on the boundary has not much more than 13 activated neighbors. Figure 7 illustrates both possible corrections. We impose that activated points need at least a = 8 activated neighbors to stay activated, which is approximately 30 % of the maximum number of neighbors. Deactivated points need less than d = 18 activated neighbors to stay deactivated, which is approximately 70 % of the maximum number of neighbors. The thresholds are chosen empirically with test runs so that the effectiveness of the error correcting step is maximized. If the thresholds were much lower, too few corrections would occur. If the thresholds were higher, the true cell surface would become eroded.
For lattice points on the side, edge or in the corner of the simulation box (or the local CPU domain), the maximum number of neighbors available in local memory is lower. In these cases, the thresholds are scaled down linearly with the maximum neighbor count, which is equivalent to extrapolating the missing neighbors. This avoids the overhead of having to implement a halo and communication between individual CPU nodes for the flag lattice. The thresholds are shown in Table 1.
The correction step scales with t ∼ N 3 , but since it is only executed every few hundred regular update steps, this is not of concern. The correction step also counts the number of changed lattice points in order to keep track of the errors.

Setup
The simulation setup consists of a single RBC with R = 4 μm in Equations (14) and (15). The RBC is placed vertically in a rectangular channel with dimensions L x = 42.7 μm, L y = 12.0 μm and L z = 10.0 μm (periodic in x) as in our previous work 11 and as shown in Figure 8. The initial RBC position is slightly off center in the y and z direction (y initial = 1.50 μm, z initial = 0.833 μm). The fluid moves in x direction at an average flow velocity of v avg = 1.5 mm/s by imposing a volume force (pressure gradient). The simulation time is 9.26 seconds (60 Million integration steps). Figure 9 shows snapshots of the first 0.2 % of the simulation for = 3, in which the cell crosses the periodic boundary for the first time.
F I G U R E 9 Eight snapshots of the simulation with = 3 at a distance of 18 750 integration steps or 2.89 ms each. The cell surface is visualized as a gray wire frame and the inside/outside data is rendered volumetrically in red. The volume tracking works very accurately, even when the cell is cut in half while crossing the periodic boundaries [Colour figure can be viewed at wileyonlinelibrary.com] Depending on the viscosity contrast , there is a shape preference for either the "slipper" (elongated shape, asymmetric position in channel, rotating surface) or the "croissant" (contracted shape, symmetric position in channel center, stationary). For low values of , croissants are preferred while large values of lead to slipper states. Comparing the transition point to a recent set of BIM simulations, 11 which can naturally and exactly deal with inside/outside viscosity contrasts, will validate our algorithm. Figure 10 depicts the different behavior of the RBC when is varied between 1 and the physiological value of approximately 5. Besides visual inspection, there are two quantitative indicators for the cell shape: center of mass radial displacement d and asphericity a. Both are scalar values that change over time. Figure 11A shows the radial displacement d ∶= √ Δy 2 + Δz 2 of the center of mass from the middle of the rectangular channel over time for two different values of . The two distinct stable cell shapes are represented by either the graph dropping to zero (croissant) or the graph oscillating around an offset (slipper). Figure 11B shows the asphericity-a scalar value indicating how nonspherical the surface is. The asphericity is, according to Fedosov et al, 5 defined as follows: First, the center of mass ⃗

Results
x is calculated from all N membrane vertices at locations ⃗ Then, we define the gyration tensor S by Equation (18). Finally, with 2 x , 2 y , and 2 z being the eigenvalues of S, the asphericity a is defined by Equation (19).
As can be seen, both the LBM and BIM graphs for = 1 quickly converge to zero for the radial displacement and to a small constant offset for the asphericity. The LBM and BIM graphs for = 5 show pronounced oscillations which are caused by the cell membrane continuously rotating around the cell interior (so-called tank-treading) and the offset from zero indicates that the cell is located asymmetrically in the channel. LBM and BIM differ only slightly in offset and phase of the oscillation, while the oscillation frequency and amplitude are almost the same. Possible explanations for this difference are that the exact flow rate in LBM might mismatch by a few percent compared to BIM or that in the BIM simulations a different cell surface tessellation algorithm with only 2048 triangles is utilized. Figure 12 shows the averages over the last 0.2 s econds, which is approximately the period of cell rotation, of the radial positions from Figure 11A. The resulting diagram represents the phase change of the RBC at ≈ 4.75, where the RBC changes from croissant ( < 4.75 ) to slipper ( > 4.75 ) in good agreement between BIM and our LBM tracking algorithm.
Every LBM data point in Figure 12 corresponds to approximately 2 weeks of compute time on 16 cores of two Intel Xeon E5-2680 CPUs with the fast tracking algorithm of Section 3.2. When instead using the ray-cast algorithm for every lattice point (N 5 scaling) in every simulation time step, the compute time for the same simulation would be approximately three years. With the ray-casting algorithm only applied for the points close to the surface (N 4 scaling) in every time step, compute time would be 4 months. However due to the parallelization of the IBM in the ESPResSo simulation package, in multi-CPU parallelization for any CPU core only part of the cell membrane is known, making the ray-cast-based algorithms very difficult to parallelize. The comparison of compute times instead is done with the simulation executed on only a single CPU core and the compute time is extrapolated to what it would be on 16 cores.

CONCLUSION
We presented an efficient tracking algorithm to distinguish the interior fluid of a dynamically deforming RBC from the outside fluid during a lattice-Boltzmann-immersed-boundary simulation. By calculating the scalar products of area-weighted surface vertex normals with local distance vectors between the surface vertices and the LBM lattice points, we track the enclosed cell volume. As our algorithm treats only those LBM lattice points which are in immediate vicinity to the RBC membrane, it is capable of very accurate discrete volume tracking without significantly impacting simulation performance.
As one particular application, we examined a RBC with viscosity contrast flowing through a microchannel. The results demonstrated good agreement between the LBM-IBM approach and BIM simulations including viscosity contrast. Finally, our method is not restricted to LBM simulations but can be employed equally well in combination with other grid-based approaches such as finite-difference or finite-volume methods.

AUTHOR BIOGRAPHY
Moritz Lehmann was born in 1997 in Bavaria, Germany. During school he taught himself multiple programming languages and developed his first software project called PhysX3D, a real-time 3D n-body simulation with a custom graphics engine for orbit plotting. After the Abitur he studied physics at the University of Bayreuth. In addition to studying biophysics, he is currently doing his PhD in theoretical physics -specializing in high-performance GPU programming with OpenCL -on an efficient GPU implementation of the LBM named FluidX3D, which speeds up complex simulations with free surfaces, particles and thermal convection from days to minutes of compute time while at the same time visualizing results in real time. inout_Vector (x , y , z) , // segment and a triangle in 3D 14 inout_Vector (x +0.01 , y +0.03 , z +1.04) ) , 15 inout_Triangle (