A scalable solver for a stochastic, hybrid cellular automaton model of personalized breast cancer therapy

Mathematical modeling and simulation is a promising approach to personalized cancer medicine. Yet, the complexity, heterogeneity and multi-scale nature of cancer pose significant computational challenges. Coupling discrete cell-based models with continuous models using hybrid cellular automata is a powerful approach for mimicking biological complexity and describing the dynamical exchange of information across different scales. However, when clinically relevant cancer portions are taken into account, such models become computationally very expensive. While efficient parallelization techniques for continuous models exist, their coupling with discrete models, particularly cellular automata, necessitates more elaborate solutions. Building upon FEniCS, a popular and powerful scientific computing platform for solving partial differential equations, we developed parallel algorithms to link stochastic cellular automata with differential equations ( https://bitbucket.org/HTasken/cansim ). The algorithms minimize the communication between processes that share cellular automata neighborhood values while also allowing for reproducibility during stochastic updates. We demonstrated the potential of our solution on a complex hybrid cellular automaton model of breast cancer treated with combination chemotherapy. On a single-core processor, we obtained nearly linear scaling with an increasing problem size, whereas weak parallel scaling showed moderate growth in solving time relative to increase in problem size. Finally we applied the algorithm to a problem that is 500 times larger than previous work, allowing us to run personalized therapy simulations based on heterogeneous cell density and tumor perfusion conditions estimated from magnetic resonance imaging data on an unprecedented scale.


| INTRODUCTION
Mathematical modeling and computer simulations, informed by patient-specific clinical data, can be used to make personalized predictions of response to cancer therapy. 1 The methodology can in principle be used to design patientspecific treatment plans and constitutes a promising approach to personalized cancer medicine. One of the main goals is the development of quantitative and computational tools that can effectively and efficiently simulate the outcome of multiple therapeutical strategies in individual patients. However, the complexity, heterogeneity and multi-scale nature of cancer present severe computational challenges to that goal. Specifically, treated cancer tissue entails multiple interacting processes occurring at different spatio-temporal scales (for instance, intracellular signaling pathways, drug pharmacokinetics or single cell fates such as division or death). One approach to model the interactions is the hybrid cellular automata (HCA) framework coupling discrete cellular automata (CA) and continuous model components accounting for the different phenomena and scales. 2,3,4,5 To simulate HCA models, computational algorithms have to balance the numerical schemes used to efficiently solve the different discrete and continuous formalism together with the sharing of partial states between them. 6,7 Computational time can be easily misspent in situations in which solvers are on hold while others need to reach stationary states. Furthermore, because different solvers can have different numerical meshes in space and time, resolving states sharing is not always a straightforward task. The large number of cells and the heterogeneity present within the tissue add to the complexity of simulating cancer tissue. Tumors, for example, frequently have areas with different cell densities and perfusion characteristics, cells and vessels may have different characteristics, and so on. As such, heterogeneity is known to influence treatment outcomes, large scale simulations that can capture it are needed. Furthermore, a simulation algorithm is only useful in clinical settings if the computational time is compatible with clinical decision-making. As a result, parallel computing is expected to play an important role.
FEniCS is a finite element computing platform for solving partial differential equations (PDEs) that has been fundamentally designed for parallel processing. 8 While FEniCS has already been extended to couple PDEs with ordinary differential equations (ODEs) associated with mesh nodes, the coupling with stochastic CA models required to solve HCA, is not straightforward. Specifically, mesh partitioning methods used by FEniCS are unsuitable for applying CA rules based on values in different CA meshes. We solved this issue by developing an algorithm that creates a map of what information needs to be exchanged between processes during each CA update. In addition, we implement a priority system to resolve potential conflicts caused by multiple processes attempting to set a CA node value at the same time. This, combined with an efficient and lightweight random number generation implementation, enables us to reproduce stochastic model simulations, produce unbiased results, and claim statistical significance.
To demonstrate the potential of our approach and algorithms, we consider an updated multi-scale HCA model, previously used for simulating personalized breast cancer therapy. 9,10 In previous work, 9 model simulations were used to explain treatment outcomes at the level of individuals and suggest more successful regimes for tumors that did not respond to therapy. These findings were encouraging, but they were limited to small sections of the tumor with only a few hundred biological cells and were simulated on a single CPU. While the new model is parameterized using the same methods as described, 9 this work is dedicated to describing the unique numerical schemes implementing the stochastic hybrid discretecontinuum modeling. The new parallel solver allows to simulate more clinically relevant pieces of tumors and to capture the tumor heterogeneity as observed in magnetic resonance imaging (MRI) data. In addition to testing the solver scalability of single core performance, we performed a weak scaling study on a cluster of up to 80 cores as problem size increased. Finally, we demonstrate that it is now possible to run simulations of 2D tumor sections 500 times larger than previously. 9,10 2 | METHODS

| Magnetic resonance imaging
Variation in cellular and vascular density across breast tumor tissue of one patient was assessed with MRI. The patient underwent MRI examinations before the start of treatment, and after 1 and 12 weeks of neoadjuvant treatment. Examinations were performed on an ESPREE 1:5 T MR scanner (Siemens, Erlangen, Germany) equipped with a phased-array bilateral breast coil (CP breast coil, Siemens, Erlangen, Germany). The MRI protocol was a state-of-the-art MRI protocol 11 with T2-weighted, diffusion-weighted (DW), and dynamic contrast-enhanced (DCE) MRI. DCE-images were acquired using a radial, spoiled gradient echo with k-space weighted image contrast (KWIC), using spectral adiabatic inversion recovery (SPAIR) for fat-suppression (TE = 2:59 ms, TR = 5:46 ms, flip angle = 15 ∘ , field of view = 320 mm Â 320 mm, in-plane resolution = 1 mm Â 1 mm, slice thickness ¼ 15 mm). After eight pre-contrast series, the contrast agent (Gadovist, Bayer Pharma, Germany) was administered at a dosage of 0:8 mmol kg À1 , at a rate of 3 ml s À1 , followed by a 20 ml saline flush. Subsequently, 32 post contrast image-series were acquired at a frequency of 1=13 s À1 . DCE images were analyzed using an extended Tofts two-compartment pharmacokinetic model, yielding 1 mm Â 1 mm-resolution maps of in-vivo perfusion parameters in the tumor. Perfusion parameters calculated includes the permeability-surface area product, K trans of the vasculature, volume fraction of the vascular space, v p and volume fraction of the extravascular, extracellular space, v e . Additionally, the cellular density was estimated as Image analysis was performed using nICE (Nordic NeuroLab, Bergen, Norway). Patient specific arterial input function was sampled from a region of the right atrium of the heart, visible in the DCE images.

| Multi-scale mathematical model
We begin by summarizing the multi-scale mathematical model of breast cancer and its treatment by a combination of administrated drugs. A more in-depth biological and clinical discussion of the model can be found in our previous work. 9 Using hybrid CA, 2,3 the model accounts for the response of a 2D cross section of tumor tissue to a combination of chemotherapeutic and anti-angiogenic agents. Thus, a tumor section is represented by a finite regular square lattice L, consisting of a set of nodes labeled by their positions x L, x ¼ iΔx, jΔx ð Þ , i ¼ 0, ::,n À 1, j ¼ 0, …, m À 1, Δx being the distance between nearest nodes. Biological cells and cross-sectional cuts of functional blood vessels are modeled as individual agents occupying a single lattice node. Microenvironmental factors in the tissue section, such as oxygen, are modeled as continuous variables over the domain Intracellular and intravascular processes are modeled as continuous variables associated to each cell and blood vessel, respectively. To account for cell and blood vessel dynamics and the molecular factors that control them, we build five interlinked model modules: the cellular, subcellular, vascular, intravascular and extravascular-extracellular modules. Figure 1 shows a diagram with the main components of each module, the interactions between them and the model formalism used in each case. A more detailed description of each module is provided below.

| Stochastic cellular automaton for the cellular module
The presence of cancer and other cells, referred here as stroma, on the lattice sites at time t, is described by a function ℒ that takes three possible values: , if x is occupied by a stroma cell. At most one cell can occupy one site of the grid. It is assumed that cancer cells can divide or die but cell movements are neglected. Cancer cell division is controlled by an internal cell cycle clock described in the subcellular module below. When a cell at position x cell is committed to divide, it can be killed by the chemotherapy agent i ¼ 1, 2, 3 with a probability given by: where The value max G i,0 ð Þ refers to the maximal concentration of chemotherapy i in the blood and b i is a parameter describing the sensitivity toward chemotherapy i. If the cell at position x dies at time t, it is removed from the computational grid by setting L x,t ð Þ¼0. If chemotherapy does not kill the proliferating cell, a daughter cell is placed at an empty (Moore) neighbor location with the highest oxygen pressure. If, however, no free space is available in the neighborhood, the cell cycle of the parent cell is reset to zero and no daughter cell is produced.
Stroma cells in our cellular automaton do not proliferate or die. Thus, their role in the model is restricted to competition for space and oxygen with cancer cells and to production of the vascular endothelial growth factor (VEGF) under hypoxia (low oxygen tension).

| Stochastic cellular automaton for the vascular module
We consider only perfused functional blood vessels and assume that all are cylindrical and perpendicular to the modeled tumor section. The presence of cross-sectional vessel cuts in the lattice at time t is given by the function G, with G x,t ð Þ¼ 1 representing the presence of a functional vessel and G x,t ð Þ¼ 0 its absence. Vessel dynamics is modeled by a birth-death process, with the probability of creating and removing vessels depending on the extracellular spatial distribution of the angiogenic factor VEGF. Specifically, we assume there is a range of VEGF concentrations where vessels are viable but outside that concentration range vessels are more likely to regress and disappear. Therefore, at every fixed time step Δv, we define the probability of creating and removing a vessel at x L as: where V low and V high are lower and upper thresholds of VEGF concentration where functional vessels are viable, and V x ¼ V x,t ð Þ denotes the extracellular concentration of VEGF at location x at time t, which is described below. Extending our original model, 9 we now assume that the probability of birth p birth x, t ð Þ is proportional to the VEGF concentration V x, t ð Þ, instead of being constant. Following Owen et al, 12 we also assume that at each time-step Δv, the expected value of new created vessels is given by n new : where Pr sprout is the maximum probability that an endothelial sprout emerges from a surface of a vessel and forms a new vessel, gis the set of all blood vessel locations at time t. r x ¼ r x, t ð Þ is the vessel radius, r 0 is the average radius of initial vessels, h c is the height of the simulated tissue layer and V sprout is the VEGF concentration at which the probability is half-maximal.

| Systems of ordinary differential equations for the intravascular module
The time-dependent concentrations of four drugs in the blood, that is, Avastin and a cocktail of three chemotherapies (Fluorouracil, Epirubicin, and Cyclophosphamide) together known as FEC100, are modeled by their respective pharmacokinetic equations, dosage and drug administration schedule. It is assumed that all vessels share the same drug concentration at a given time point. For Avastin, a two-compartment model is used: where are the concentrations of Avastin at time t in the plasma and peripheral compartments, respectively, and q, cl, v 1 , v 2 , are kinetic and compartment volume parameters. For Fluorouracil a single compartment model is used: and where G 1,0 ¼ G 1,0 t ð Þ is the plasma concentration at time t, d n is the nth dose, and v max , k m and V are kinetic parameters. For Epirubicin, a three-compartment model is used: and where G 2,0 ¼ G 2,0 t ð Þ is the plasma concentration at time t and G 2,1 ¼ G 2,1 t ð Þ and G 2,2 ¼ G 2,2 t ð Þ are concentrations in two peripheral compartments at time t. The parameters q 1 , q 2 , q 3 , cl 2 are kinetic rates and w 1 , w 2 , w 3 are compartment volumes. For Cyclophosphamide, a single compartment model is again used: and where G 3,0 ¼ G 3,0 t ð Þ is the plasma concentration at time t and cl 3 , u are kinetic parameters. For G 2,0 and G 3,0 , closedform solutions, described in elsewhere, 13 were used for the calculation of plasma chemotherapy concentration.

| Ordinary differential equations for the intracellular module
For each cell c at position x c , the intracellular amount of p53, denoted by P c ¼ P c t ð Þ, and the intracellular amount of VEGF, denoted by V c ¼ V c t ð Þ, are modeled by the following system of equations: where k 1 , k 0 1 , K p53 , k 2 , k 0 2 , k 00 2 , J 5 and K V are kinetic parameters. The function K x c ¼ K x c , t ð Þ represents the oxygen concentration at cell location x c and is modeled by a PDE (described below).
Similarly, the progression though the cell cycle ϕ ¼ ϕ t ð Þ of each cell c at position x c is modeled by the equation: where T and K ϕ are scalar parameters. At start, all cancer cells are assigned a random ϕ value between 0 and 1. The value of the cell cycle of a newly divided cell is set to ϕ ¼ 0, and it increases until the first cellular update after ϕ ≥ 1. At that point, the cancer cell is committed to divide and will either divide or die due to the effect of chemotherapy as described above.

| Reaction-diffusion equations for the extravascular-extracellular module
Each of the four drugs (Avastin A, Fluorouracil G 1,0 , Epirubicin G 2,0 , and Cyclophosphamide G 3,0 ) is modeled as moving from the vessels (concentrations denoted by A 1 , G 1,0 , G 2,0 , G 3,0 respectively) to the extracellular-extravascular space (concentrations denoted by A, G 1 ,G 2 , G 3 , respectively). These rates depend on the difference in concentrations between the vessel U v ¼ U v t ð Þ and the extracellular-extravascular space (EES), U x ¼ U x, t ð Þ, and the drugs' vessel surface permeabilities P U , and are modeled as: where the integration domain V 0 is the total volume of the cross-section, and S x ¼ S x ð Þ is the vessel surface-per-volume ratio for a vessel at x.
where r 0 is the average radius of the initial vessels, r x ¼ r x, t ð Þ is the radius of the vessel at x, and h is the vessel length (which is the height of tissue section and equal to the cell size Δx). Function δ is the Dirac delta function and for convenience, we introduce a short-hand for the sum of Dirac delta functions over a set of points X a : This forms part of the reaction terms in Equations (17) to (21) below. For chemotherapy and Avastin, we assumed that P U ¼ c U P Gado . P Gado is the permeability of gadolinium, the contrast agent used in MRI, and c U ℝ is a scaling constant. Given that the initial transfer rate , is the Tofts model permeability-surface area product of gadolinium, 14 where N init v is the initial number of vessels and S MRI is the permeability-surface area product of a voxel in MRI, we can derive that It can then be substituted in Equation (14) for Avastin and the chemotherapies to give: Oxygen in the EES is modeled by the following reaction-diffusion equation for all x D and t > 0: where D k , ϕ K , K 1 and K 0 are scalar parameters, gis the set of all cell locations. s K 0, 1 f g is constant. When s K ¼ 1, this becomes to the time-dependent reaction-diffusion PDE, while s K ¼ 0 corresponds to the steadystate equation. The latter is used to initialize the oxygen pressure given initial locations of cells and vessels. The first term in the right-hand-side of Equation (17) accounts for oxygen diffusion, the second term accounts for the consumption by cells and the third term represents point sources accounting for the flow of oxygen from the blood vessels.

Chemotherapies
The concentration of each chemotherapy G i ¼ G i x, t ð Þ, with i ¼ 1, 2, 3 in the EES is modeled by the following reactiondiffusion equation for x D and t > 0: where D G i is the diffusion constant of G i , ψ G i is the linear decay rate of G i . G i,0 is the concentration of chemotherapy i in the blood. The first term in the right-hand-side of Equation (18) accounts for diffusion of the concentration, the second term accounts for drug decay, the third term represent point sources accounting for the flow chemotherapy from the vessels.

VEGF-Avastin complex
Avastin is a VEGF-inhibitor that binds to the VEGF and produces an inactive VEGF-Avastin complex, thereby reducing the availability of active VEGF. The interactions between the Avastin, VEGF and VEGF-Avastin complex concentrations Þ are given by the following system of reaction-diffusion equations for x D, t > 0: where D V , D A , D C , a, b, k a , k d , ψ V , ψ A , ψ C and c A are scalar parameters and V c is the time-dependent intracellular concentration of VEGF described in Equation (11) and A 1 is the time-dependent concentration of Avastin intravascularly. The first term in the right-hand-side of Equation (19) accounts for diffusion of the VEGF concentration, the second term accounts for the release of VEGF by cells, the third and forth terms account for VEGF binding/unbinding to the VEGF inhibitor Avastin and the last term accounts for natural VEGF decay. In Equation (20), the first term in the right-hand-side accounts for diffusion of the Avastin concentration, the second and third terms account for Avastin binding/unbinding to VEGF, the forth term accounts for drug decay and the last term represent point sources accounting for the flow of Avastin from the vessels. Finally, the first term in the right-hand-side of Equation (21) accounts for diffusion of the VEGF-Avastin complex, the second and third terms account for Avastin binding/unbinding to VEGF and the last term represents natural decay of the complex.

| Patient-specific model initialization and parameterization
All simulations run in this study were personalized by data from a specific breast cancer patient (Patient 3) from a clinical trial. 9 Model initialization and parameterization are, unless differently specified, as in that publication, where clinical, histological, MRI and molecular data were used to estimate model parameters and initial values. Patient 3 is a complex patient and was chosen for exhibiting very heterogeneous perfusion conditions as observed by MRI. More precisely, MRI showed a tumor core with very low perfusion and viable cells and a tumor edge with much higher perfusion values. All parameter used for the simulations are listed in Table 1.

| Numerical methods
The main algorithm for the numerical solution of the full model equations is presented in Algorithm 1 below. Further specification of the numerical techniques for solving the separate subproblems are given in the text below. The numerical solver was implemented using the open source FEniCS Project finite element library. 8 The code is available at https://bitbucket.org/HTasken/cansim.

| Numerical solution of the PDE systems
The time-dependent, nonlinear systems of PDEs describing the evolution of the oxygen and chemotherapy concentrations and the VEGF-Avastin complex (Equations (17) to (21)) are solved using the finite element method in space and finite difference method in time. The non-linear problems are solved using the Newton-Raphson method and all linear systems are solved using iterative Krylov methods designed to scale to large-scale simulations. The computational domain D, representing a tissue slice, is a rectangular region: We call T the base regular square mesh with n Â m vertices associated with D. The vertices are the potential locations for the vessels and biological cells. To avoid having the accuracy of the PDE solutions limited by the size of the biological cells, a finer mesh is used for the finite element discretization of the PDEs. Specifically, we used a uniform refinement T H of the base mesh with 2 Á 2 2 n À 1 ð ÞÂ2 2 m À 1 ð Þtriangular elements. The same refined mesh was used for the discrete functions, that is, the delta function representation of the vessels and biological cells. See section 2.5 for more detail. The choice for the number of refinement was determined by a mesh independence study, in which the relative error of the solution was stable after twice refinement.

Solving the chemotherapy concentration equations
We first consider the numerical solution of the system of chemotherapy equations, 18 for i ¼ 1, 2, 3. Each equation is time-dependent but linear, and the equations are independent of each other. We first discretize each PDE by the implicit second-order Crank-Nicolson scheme in time. At each time t k for k ¼ 1, …,K, given the concentrations at the previous time G kÀ1 i , we solve for the concentrations G k i using the finite element method with continuous piecewise linear finite elements relative to the mesh T H . The resulting linear systems of equations are symmetric and positive definite, and were solved (with optimal complexity) using a conjugate gradient (CG) solver with algebraic multigrid (BoomerAMG 15 ) preconditioning with a relative solver tolerance of 10 À8 .

Solving the oxygen concentration equation
We consider the time-dependent version of the nonlinear oxygen concentration Equation (17) (with s K ¼ 1). As for the chemotherapy equations, we first discretize the PDE by the implicit second-order Crank-Nicolson scheme in time. The resulting nonlinear system of differential equations at each timestep t k is discretized using continuous piecewise linear finite elements relative to the mesh T H . We solve the resulting nonlinear system using a Newton iteration, with a tolerance of 10 À8 . The inner loop linear systems are symmetric and positive definite, and were solved with a CG Krylov solver with Jacobi preconditioning, and a relative solver tolerance of 10 À8 .  (17) with s K ¼ 0. 4: Define the initial condition for the cell cycle ϕ based on a uniform random distribution: and compute the intracellular levels of P t ð Þ and V c t ð Þ at initial time (t ¼ 0) by solving (10) and (11), respectively, using the initial oxygen concentration K Á ,0 ð Þ. 5: Define the initial chemotherapy concentrations by setting G j Á ,0 ð Þ¼0 for j ¼ 1, 2, 3. Define initial conditions for V , A and C by setting V Á ,0 ð Þ¼0, A Á ,0 ð Þ¼0 and C Á ,0 ð Þ¼0.

15: end while
We also used a linear approximation of the nonlinear system that could lead to significant CPU time reduction. In particular, the non-linear term at timestep t k can be approximated by denotes the solution of oxygen at time k À 1. The resulting linearized system of differential equations at each timestep t k is discretized using continuous piecewise linear finite elements relative to the mesh T H . The resulting linear system of equation is symmetric and positive definite, and were solved (with optimal complexity) using a CG solver with algebraic multigrid (BoomerAMG 15 ) preconditioning with a relative solver tolerance of 10 À8 . We found that the difference between the non-linear and linear approximations to be negligible (with a difference of less than 0.03%).
Solving the coupled system equations of VEGF/Avastin complex The system of PDEs was discretized by the implicit second-order Crank-Nicolson scheme in time. The resulting nonlinear system of differential equations at each timestep t k is discretized using continuous piecewise linear finite elements relative to the base mesh T H . We solve the resulting nonlinear system using Newton's method, and the (nonsymmetric) linear systems were solved with an iterative Krylov (GMRES) solver with Jacobi preconditioning. The stopping criteria for the Newton and Krylov solvers were set to 10 À8 for both.
The coupled system of VEGF/Avastin complex can be simplified when Avastin is not administered. In this case, the PDE system reduces to a linear PDE in V alone. The resulting linear equation is symmetric and positive definite, and were solved (with optimal complexity) using a CG solver with algebraic multigrid (BoomerAMG 15 ) preconditioning with a relative solver tolerance of 10 À8 .

| Numerical treatment of ODEs
The ODEs for the intravascular module and cell cycle can be solved analytically 13 and evaluated at the necessary time points. For the coupled system of the intracellular module, solution to P x, t k þ dt ð Þcan be written in explicit form given solution P x, t k ð Þ at t ¼ t k . For V c x, t ð Þ, VODE solver with implicit Adams method from SciPy package was chosen for the non-stiff problem for each cell. This solver uses an adaptive time step, and the maximum time step was set to 1min:.

| Numerical treatment of cellular automaton
The biological cells and vessels are in the equations above represented as delta functions, and these delta functions are here numerically approximated with continuous, piecewise linear functions defined relative to the refined mesh T H . Such an approximated delta function, δ x À x I ð Þ, will take the form of a right square pyramid centered at x i with the value of 0 at all vertices, except at x i where it will take the value that makes the integral over the approximated delta function become 1 (as it should for a delta function).
For each cancer cell at x ¼ x i , x j À Á ℒ with cell cycle ϕ x ≥ 1, the cell is killed with probability drawn from a cumulative Beta distribution Beta 1, β ð Þ,β > 0. If the cell is not killed, a new cell is placed at the lattice site x 0 x p ,x q À Á ,p ¼ i À 1,…, i þ 1, q ¼ j À 1, …, j þ 1 with the highest oxygen concentration K. No cell is placed if no space is available in the neighborhood, i.e. if all (Moore) neighbor sites are occupied by other cells. If more than one cell attempts to place a new cell at the same lattice site, the cell with the highest ϕ wins. The new cell inherits its minimum cell cycle duration T min . Next, ϕ x and ϕ x 0 are set to ϕ x mod 1 ð Þand 0 respectively. The intracellular VEGF and TP53 levels P, V c are set to zero at both x and x 0 at time t. The sequence of updates for proliferation-ready cells is asynchronous such that the new state of a cell do not affect the calculation of states in neighboring cells.

| Linking spatially parallel continuous and discrete modules
The computational expense of the whole algorithm is dominated by the operations within each time step. Each time step consists of "continuous update"-solving a sequence of PDEs and ODEs, and "discrete update"-computing states of cellular automaton. Due to the inherent dependencies between the continuous and discrete components, parallelism in the time variable is more difficult to approach. Instead, since the solvers in the FEniCS system are spatially parallel, we consider a fork-join strategy. They are simple to integrate with the rest of the algorithm as long as the communication overhead is kept to a minimum.
Standard domain decomposition techniques, iterative linear solvers, and preconditioners were used to parallelize the PDE assembly and solvers. The parallelization of the ODEs was trivial: they are made up of spatially decoupled problems that are described on each biological cell and can thus be solved independently. The parallelization of the cellular automaton models necessitated further consideration because they are dependent on the oxygen levels in neighboring biological cells. Retrieving that information requires inter-process communication if that cell is at the boundary of the domain decomposition. Cells are classified into two categories based on their location within the sub-domain. Cells located inside a sub-domain are marked as internal, while those located at the boundary of two or more subdomains, shared by two, three or four processors are marked as inter-facial. To achieve this, we added a setup routine (Algorithm 2) to the algorithm that allows parallel processes to determine which process owns their neighboring biological cells. These neighbor maps allowed the cellular automaton to retrieve neighboring biological cell values with minimal overhead.

| Choosing a scalable parallel random number generator
Since the model is stochastic, having a scalable parallel random number generator (RNG) is an important aspect of parallelization. To investigate and understand the simulation results, we must be able to reproduce the same scenarios and obtain the same confidence intervals each time the same stochastic experiment is performed; when debugging such parallel stochastic application, we also need to reproduce the same result to correct any anomalous behavior. Furthermore, for applications in predicting treatment effect, it is important to produce unbiased results and claim statistical significance. Implementation of pseudo RNG also reduces the number of simulations needed for claiming statistical significance in cross-comparison between different treatment regimens since the simulation would be the same until the point of divergence induced by different regimens.
Pseudo RNG in high-performance computing (HPC) applications, particularly in our HCA model, requires the following criteria. It cannot download, or store the hundreds of thousands of numbers needed to reproduce the experiments due to constraints of the server. The random number sequence assigned to each cell or vessel must not depend on the number of processors, that is, each cell or vessel should autonomously obtain either its own random sequence or its own sub-sequence of a global sequence. The pseudo RNG must also come with good statistical properties, approximating as close as possible a Get oxygen concentration 8x ∈ L x ð Þ 6: Get and update ϕ x ð Þ 7: Gather x such that ϕ x ð Þ > 1 8: update cells according to cellular automaton rules 9: 10: if more than one processor attempt to update at the same x then 11: update L x ð Þ from the processor with higher ϕ x 0 ð Þ value 8x 0 in the neighbour 12: Set automata, vegf, p53 and cell cycle value of the new cancer cell equal to the parent cancer cells 13: end if 14: end for 15: Update map truly random sequence. The RNG should also be memory-efficient as sequences of the RNG will be the length of any given simulation, and the size of the RNG grows at the rate of the total number of vertices. We have therefore chosen permuted congruential RNG 16 which partitions the main sequence of the generator into sub-sequences for the sake of memory-efficiency, good statistical properties, small state and a more than sufficient period.

| Small-scale simulations
We first run personalized simulations of a patient analyzed previously with a previous version of the model and using a non-parallel solver. 9 These simulations correspond to a 200 μm Â 300 μm tissue section, a size that allowed us to use exact initial cell positions available from histopathological slides. We use here the same initial conditions and previously published model parameters 9 (see Table 1). As both the model and the solver are now extended, we do not expect to get the same solutions as in previous work. We tested different perfusion conditions estimated from MRI images and confirmed that the treatment outcome after 12 weeks strongly depends on the perfusion conditions used. In Figure 2, we show a simulation representing a high perfusion condition estimated at the outer edge of the tumor. In the simulation the initial cell density is reduced by half after the first chemotherapy shot at week 0. Chemotherapy is efficiently distributed in space but it is washed out after a few days. As approximately half of the cells are not dividing while the chemotherapy is available, they are not killed and the number of cells grows significantly before the second chemotherapy shot at week 3. After week 3 instead, all cancer cells are killed as they divide when enough chemotherapy is available. Comparing with previous simulations, the final outcome is matched but the transients are slightly different. We The lower panels depict the spatial distribution of oxygen, VEGF and chemotherapy at weeks 0, 1, and 3 in the tissue section. The presence of cancer cells is shown in black dots in the oxygen panel. Since the vessels are point sources of those molecules, areas with high blood vessel density can be perceived in the oxygen and chemotherapy panels attribute it mostly to solving time-dependent PDEs instead of the steady-state solver considered previously. This difference in solvers adds a factor of seven to the run time on a single core.

| Solver performance
To evaluate the runtime and scalability of the complete numerical solver, we consider a simulation of a 1 mm 2 tissue slice (D ¼ 0, 1 ½ Â 0, 1 ½ mm 2 ), with n ¼ m ¼ 100. Initial cell and vascular densities were assumed to be 0.4 and 0.12 respectively. These settings reflect a real patient simulation scenario.

| Serial runtime profile
The average intra-and inter-update timing of the solver was first carried out on a single CPU core. The total timings, broken down by computational processes and by model modules, are shown in Figures 3 and 4.
We observe that the assembly of the discrete PDE operators dominates the runtime, specifically the time-dependent assembly of the non-linear PDEs ( Figure 3A). As expected, the solution of the linear systems (PETSc Krylov solver) is also a substantial component. The other components and in particular the solution of the ODEs represent a nearly negligible cost. Overall, we found similar cost distribution comparing intra-and inter-update, indicating that communication during discrete updates were efficient.
The breakdown by modules shed some further insights on the cost of assembly: the solution of both VAC and Oxygen modules are obtained by solving non-linear PDEs, and the system is re-assembled for each Δt. To reduce the runtime of these modules, an explicit splitting scheme could be used instead (corresponding to a single step of the F I G U R E 4 Average inter-update timing of the model with Δt ¼ 30 min solved over t ¼ 120 min F I G U R E 3 Intra-update timing of the model with Δt ¼ 1 min solved for t ¼ 30 min Newton iteration) at the cost of numerical accuracy. Time spent on chemotherapy module comes second as there are three types of chemotherapies to be solved separately. Initialization includes mesh initialization as well as function initialization for each module. This is done only once at the beginning of the whole simulation, and is therefore expected to become negligible for longer simulations.

| Parallel scalability
We also investigated the parallel (weak) scalability of the assembly and solution of the PDEs. Weak scalability can be examined for a series of problem sizes by assigning a constant problem size to each processing element-typically for T A B L E 2 Timing of temporal chemotherapy update (t = 30 min) Note: N proc is the number of processors, N it is the number of Krylov iterations, N dofs is the total number of degrees of freedom, N dof is the average number of degrees of freedom per process, t A is assembly runtime, WSE A is the weak scaling efficiency for the assembly, t S is the linear solver runtime, and WSE S the linear solver weak scaling efficiency. Δt was kept constant during the update and the results shown are the minimum of five repeated runs of the updates with random initialization of cell and vessel configurations. Intravascular concentration was kept constant during the simulation.
T A B L E 3 Timing of temporal oxygen update (t = 30 min) Note: N proc is the number of processors, N it is the number of Krylov iterations, N dofs is the total number of degrees of freedom, N dof is the average number of degrees of freedom per process, t A is assembly runtime, WSE A is the weak scaling efficiency for the assembly, t S is the linear solver runtime, and WSE S the linear solver weak scaling efficiency. Δt was kept constant during the update and the results shown are the minimum time of five repeated runs of the updates with random initialization of cell and vessel configurations.
an increasing amount of processor cores. Perfect weak scaling is achieved if the run time stays constant while the workload is increased in direct proportion to the number of processors. The weak scaling factor was defined as where t 1 and t N is the runtime when using 1 and N processors, respectively, and was calculated for the assembly and solve times separately. All weak scaling experiments were conducted on Saga, the high performance computing facility placed at NTNU in Trondheim, composed of Intel Xeon-Gold 6138 running at 2.0 GHz. Each Saga compute node consists of 40 physical compute cores with 192 GiB memory each, giving approximately 4 GiB memory per physical core. We used FEniCS version 2019.1.0 built with an Intel compiler. These scalability tests were run on two compute nodes totaling 80 physical cores, and with exclusive access to minimize communication overhead. For the sake of comparison, the end time of all temporal updates of PDEs in the model was set to t ¼ 30 min, while cell and vessel configurations were initialized randomly. The parallel runtime and scalability of the chemotherapy equations are presented in Table 2. We observe a nearly optimal WSF ≤ 1:3 ð Þfor the finite element assembly up to 80 cores. For the linear solver, we observe a moderate increase in the WSF up to 2 for 16 cores and 2:5 for 80 cores. The number of iterations of the Krylov solver stay constant however.
For the solution of the oxygen concentration equation, the parallel scalability timings are shown in Table 3. We note that the solution time is around four times that of the chemotherapy equations and thus a dominant contribution. The finite element assembly scales well up to 80 cores (WSE A ≤ 1:4) for both non-linear and the linearized solver. However, by linearizing the oxygen equation, the weak scalability of the linear solver is improved from a WSF of up to 9 in 80 cores to be comparable to the results for the chemotherapy equations (WSE less than 2.24). We also investigated the F I G U R E 5 Simulation of a patient biopsy of size 1 mm Â 1 mm, equivalent to the size of a MRI voxel. The top panel depicts the evolution of the cancer cell density over the course of 12 weeks of treatment. Chemotherapy shots are given at weeks 0, 3, 6 and 9 and are illustrated by a vertical dotted line in the figure. The lower panels depict the spatial distribution of oxygen, VEGF and chemotherapy at weeks 0, 2, 4 and 6 in the tissue section. The presence of cancer cells is shown in black dots in the oxygen panel. Since the vessels are point sources of those molecules, areas with high blood vessel density can be perceived in the oxygen and chemotherapy panels weak parallel scalability of the VAC equations. The results (data not shown) were highly comparable to the scalability of the oxygen concentration solver.

| Simulating a cross-section of a MRI voxel of tissue
The spatial resolution of the available MRI data is defined by the imaging voxels of size 1 mm 3 . Thus, important biological information, such as the perfusion of the tumor is only given per voxel of tissue. As an intermediate step to run multivoxel simulations, here we demonstrate the ability of our parallel solver to run a grid-size equivalent to a cross-section of a MRI voxel. Figure 5 illustrates the cancer cell trajectory under 12 weeks of chemotherapy treatment. Since digitalized biopsy data detailing the exact distribution of cancer and stroma cells are available only for the small tumor section of size 200 μm Â 300 μm, we used the same cell density to approximate the number of cancer and stroma cells in the larger one, but they were randomly distributed in the grid to initialize the simulation. We again use the same model parameters detailed in Table 1. Comparing Figure 5 to Figure 2, we observe similar patterns. Immediately following the initial administration of chemotherapy, the number of cancer cells starts to decrease, reaching a minima about 2.5 days later. At that point, the chemotherapy treatment loses its effect and the cancer cells start to grow back again. This pattern is repeated after the second chemotherapy shot at 3 weeks, while the third shot at 6 weeks kills all existing cancer cells. The panels at weeks 0, 2, 4 and 6 in Figure 5 show the spatial distribution of cells, oxygen, VEGF, blood vessels and F I G U R E 6 Simulation of a digital biopsy sample of size 2 mm Â 1 cm from week 0 to week 1. The top panel depicts the evolution of the cancer cell density of the whole, the edge and the core of the simulated biopsy over the course of 1 week of treatment. This is followed by spatial comparative view of cancer cells distribution and three extracellular variables, from top to bottom, oxygen, VEGF, and chemotherapy between week 0 (left panels) and week 1 (right panels). In the first row, locations of simulated cancer cells were marked in black, while locations of simulated vessels were overlaid in color to represent their permeability in chemotherapy. Intravascular concentration of the three chemotherapies between week 0 and week 1 are shown at the bottom chemotherapy in the simulated tumor cross-section. As expected, the tissue oxygen concentration increases and VEGF concentration decreases when the number of cancer cells decreases. Due to a good tissue perfusion with a large number of vessels and high permeability, chemotherapy reaches the whole tissue at effective concentrations. The total runtime of this 12-week simulation on Saga took approximately 12 h with 80 cores.

| Simulating a cross-section of a digital biopsy sample
To ultimately demonstrate the impact of heterogeneous perfusion conditions on the dynamics of cell growth and killing, we constructed a mesh of size equivalent to 2 Â 10 MRI voxels. This matches with the size of the core needle biopsy taken at the beginning of the treatment in the clinical trial and we will refer to it as a digital biopsy sample. Perfusion related parameters were applied in each of the 20 simulated voxels of a digital biopsy sample as follows. First, the MRI slice with the widest tumor diameter is selected. This image reveals a difference in perfusion conditions between the tumor edge and the core. Starting from the boundary between the core and the edge of the tumor, 5 voxels were chosen radially inwards and outwards respectively, totaling 10 voxels. Another 10 voxels were chosen in the same fashion from their immediate neighbor, resulting in a simulated biopsy of 2 Â 10 voxels. Estimated perfusion parameters, specifically F I G U R E 7 Differences in cell densities between weeks 0 and 1 are depicted as box-plots. Simulated and actual cell densities from magnetic resonance imaging (MRI) were seen in blue and yellow, respectively. The x-axis shows the comparison of 10 biopsy slices. The first box-plot from the left compares the simulation shown in Figure 6 to MRI data. Paired Wilcoxon signed rank test were performed between simulated and actual cell densities for each biopsy. Asterisks were placed above the boxes of comparisons of p-values less than 0.05. Ãp < :05, Ã Ãp < :01 T A B L E 4 Results from the fitted mixed effect model. Intercept and the source of the cell/vessel density (simulated or actual) were considered as the fixed effects; the location of each MRI voxel and the biopsy sample ID was considered as nested random effects to reflect the variation in biopsy samples in either simulated or actual vessel and cell densities. The regression coefficients estimates, standard error as well as the significance of the two fixed effects were shown in the where v e is the estimated fractional extravascular extracellular space volume), from applying the extended Tofts model to the MRI data as estimates of the vessel and cell density respectively. Vessels and cells were then randomly distributed accordingly within each voxel in the grid (100 Â 100 grid points) to initialize the simulation. Figure 6 shows 1-week evolution of one digital biopsy sample under therapy. We can clearly see the complex effect of the heterogeneous perfusion condition illustrated by the heterogeneity in the oxygen concentration. As chemotherapy reaches the tissue, active cancer cells are killed at different rates in various areas of the simulated biopsy. In areas with high perfusion, cells are killed more rapidly, while in those areas with low perfusion, chemotherapy cannot be delivered sufficiently, and more cancer cells survive. In areas with no vessels, cell growth is stalled and bounded by the capacity of the computational grid and the oxygen. In areas with lower initial cell density, cells were dividing albeit at a lower rate. In addition, a combined effect in areas between high and low perfusion is observed. Drug delivery to areas with low perfusion reaches neighboring areas of high perfusion by diffusion instead of via the vasculature. Moreover, dense cell populations from low perfusion areas can grow into areas with high perfusion where more cancer cells were killed by the chemotherapy. Total runtime of the simulation was on average 18 h with 80 cores.
To assess the validity of our model, we extracted 10 biopsy samples using the same method described above. We use data from week 0 of each digital biopsy to initialize the model and differences in cell density between week 0 and week 1 are evaluated between the simulation and the MRI data for each voxel within the 10 biopsy samples (Figure 7). Given that all digital biopsies were collected from the same tumor, a mixed effect model was used to verify if there is any statistical difference between simulated and actual densities. We found no significant difference, thus proving that the numerical simulations can reproduce the observed drug outcome in large, clinically relevant heterogeneous tumor portions (Table 4).

| DISCUSSION
In this work, we present a multiscale HCA model for personalized breast cancer growth under the effect of therapy, together with a parallel numerical algorithm aimed at large scale, clinically-relevant simulation sizes. We discovered that the PDE solvers are the most computationally intensive component of the system, with non-linear coupled equation updates dominating the runtime. In terms of solver performance, the assembly of discrete PDE operators scales almost linearly, while a moderate increase in weak scaling efficiency was observed in linear solver runtime. We simulated tumor dynamics and drug therapies of tumor cross-sections in three different system sizes: a small tumor portion, an MRI voxel of tissue, and a full biopsy sample consisting of 20 MRI voxels. Our results show that the parallel implementation of our model can account for tumor heterogeneity at a clinically-relevant system size while correctly predicting treatment outcome.
Previously, multiple cancer modeling studies have used HCA models that couple CA with ODEs and PDEs, see for instance. 3,4,17,5,9 However, they typically considered small 2D or 3D tumor portions and account for only small number of cells. Multi-scale models representing clinically relevant tumor portions, like the one considered in this study, require parallel computing, specially if they include many PDEs describing the tumor microenvironment or very large ODE systems for cell signaling. A number of computational frameworks for simulating multi-scale models with parallel computing capabilities are available, 18,7 including Morpheus, 19 CompuCell3D, 20 PhysiCell, 21 CellSys, 22 Chaste, 23 Biocellion 24 or Timothy. 25 Each of these tools has their own specific features regarding modeling formalism, implementation, usability and performance. Morpheus and CompuCell3D use cellular Potts model as cell-based model formalism while CellSys, PhysiCell, Biocellion and Timothy use other off-lattice/cell centered approaches. Chaste has more flexibility and allows the user to choose their own cell-based formalism, including CA like in our case. In terms of performance, several simulation frameworks allow multithreading via OpenMP, including Morpheus, CompuCell3D, PhysiCell and CellSys. Biocellion and Timothy, instead, were designed for intense parallelisation between nodes and allow domain decomposition techniques as our solver. These tools can be used for simulating large numbers of PDEs in large domains and up to billions of biological cells in high performance supercomputers. In order to run personalized simulation of breast cancer therapy considered in our previous work 9 but considering clinically relevant tumor portions, we have taken this later approach. Specifically, we build on the domain decomposition capabilities of FEniCS for solving PDEs and efficiently link the stochastic CA models with the continuous PDE and ODE models. To the best of our knowledge, this is the first time that the popular finite element framework FEniCS is extended for simulating a multi-scale HCA model.
In terms of limitations, we used estimated cell density from MRI rather than actual cell distribution captured from a core biopsy staining for cell initialization. This is due to the MRI dataset's inability to properly identify cancer and stroma cells. We do not take into account local differences in cell density in each voxel. As a result, in some cases, simulated treatment outcomes can vary from the actual outcomes. We leave for future work the digitization of the pathological sample and integration of cell identification. Modeling other cell types and their interactions is possible but we did not account for it in this work because the current clinical data cannot inform them on a patient-specific basis. Each MRI voxel contains a large number of vessels, each with a different surface permeability and vascular flow, both of which are important for drug delivery and response. In this study, a voxel was treated as a region of uniform vascular flow and vascular permeability. Improving on this will necessitate MRI data with a higher spatial resolution than that available in the clinical trial under consideration here.
Steady-state models were used in our previous implementation. They are often understood as a simplification of the time evolution ones, with the assumption that the time-dependent solution stabilizes around the steady-state as t ! ∞. In our study, where temporal accuracy is essential, this assumption is invalid for slow perfusion scenarios. However, since it is beyond the reach of our work, we have only put in a limited amount of effort to tune the multigrid preconditioners for this problem in order to improve simulation run-time of 1 week of actual treatment, which is roughly 6 h using 80 cores in cluster).
The ability to simulate clinically relevant pieces of breast tumors under therapy is a significant step toward in silico design of patient-specific treatment plans. However, we have so far only applied this methodology to five patients and much more testing and validation is needed before this methodology can assist oncologist in finding effective therapies for each patient. Once models become sophisticated enough to predict treatment outcomes for a wide cancer patient population, one possibility to test them in a clinical setting is in silico-guided clinical trials for personalized cancer medicine. The goal of such trials is to compare a standard approved therapy with personalized drug schedules optimized through computer simulations.
Improved statistical inference methods that allow successful estimation of patient-specific parameters are needed to achieve this using complex models like the one presented here. We have taken the first steps in that direction by employing Approximate Bayesian computation, 26,27,10 a very computationally intensive technique that certainly benefits from the scalable solver presented in this study.