An efficient and locking‐free material point method for three‐dimensional analysis with simplex elements

The Material Point Method is a relative newcomer to the world of solid mechanics modelling. Its key advantage is the ability to model problems having large deformations while being relatively close to standard finite element methods, however its use for realistic engineering applications will happen only if the material point can be shown to be both efficient and accurate (compared to standard finite element methods), when modelling complex geometries with a range of material models. In this paper we present developments of the standard material point method aimed at realizing these goals. The key contribution provided here is the development of a material point method that avoids volumetric locking (arising from elastic or elasto‐plastic material behavior) while using low‐order tetrahedral finite elements for the background computational mesh, hence allowing unstructured background grids to be used for complex geometries. We also show that these developments can be effectively parallelized to improve computational efficiency.


INTRODUCTION
The Material Point Method (MPM), 1-3 originally proposed as an extension of the Fluid-Implicit Particle (FLIP) method 4 and the Particle-in-Cell method, 5 is promising for modeling problems involving large deformation of solid materials, for example, biological soft tissues, 6-8 explosive materials, [9][10][11] soil, [12][13][14][15][16][17] and snow. 18 The MPM can be regarded as the Finite Element Method (FEM) with moving quadrature points but has a key advantage over the latter in avoiding mesh distortion which can cause inaccuracy and, in severe cases, numerical breakdown in the FEM.
The basic calculation steps in the MPM are demonstrated through the series of schematics shown in Figure 1. are mapped to the background mesh at Step (b) and these are assembled into a system of equations at the nodes (shown by the white-filled circles) in Figure 1(C).
Step (c) also includes assembly of the equivalent nodal forces associated with external forces held at material points (such as body forces) and surface tractions into a global vector associated with the active nodes, that is, the nodes that are attached to elements that contain material points (the dark grey shaded elements in Figure 1(C)). This system of equations is solved for the displacements at the nodes of the background mesh in Step (d), where the deformed mesh is shown by the back lines and the original mesh by the dashed grey lines. However, it should be noted that this is a nonlinear problem and, when using an implicit MPM, it is likely that multiple iterations will be required over Steps (b) through (d) in order to find equilibrium at the deformed state. Explicit MPMs do not attempt to satisfy equilibrium at the end of the load/time step and therefore Steps (b) through (d) are only evaluated once, albeit with much smaller load/time steps compared to implicit approaches. Once equilibrium has been satisfied the nodal displacements are mapped to the material points, as shown for a single material point in Figure 1(D). This step is often referred to as the convection of material points through the background mesh; this is the only Eulerian step in an otherwise Lagrangian algorithm. The variables held at the material points, such as stress and other history-dependent variables, are also updated at this stage. Finally, in Step (f), the background mesh is reset or replaced with a new mesh and the analysis goes back to Step (a). This approach allows very large deformation of the material points while maintaining a high-quality mesh for computation in each step. The MPM can suffer from accuracy issues which are induced by: (i) stress oscillations as material points move between background grid elements where there is a sudden transfer of mass, stiffness or internal force, (ii) material points not being at the optimal locations in elements for accurate numerical integration and (iii) a difference in the number of material points and mesh nodes for mapping of information between the background grid and the material points. Several different approaches have been proposed in order to reduce the impact of (i), including: (a) modification of the spatial gradient of the shape functions, or the adoption of alternative basis functions, to ensure continuity across element boundaries; 20-22 (b) a Moving Least Squares (MLS) MPM, 23 where the stress solution at the material points is smoothed using an MLS approximation and is then integrated using conventional Gauss quadrature, an approach that has the advantage of also reducing the impact of (ii) as optimal integration is recovered; and (c) domain-based MPMs [24][25][26][27][28] where each material point has an associated line, area or volume, in one dimensional, two dimnsional (2D) and three dimensional (3D), respectively, and the basis functions that link the material point to the background grid nodes are obtained by the convolution of a material point's domain with the shape functions of the finite element grid. This results in a smooth transition of internal force and stiffness between background elements as a material point crosses an element boundary.
For (iii), Gritton and Berzins 29 and Tran and Sołowski 30 proposed temporal and null-space filters to avoid loss of information when mapping between mesh nodes and material points when there is a mismatch between the number of nodes and material points (specifically when the number of material points exceeds the number of nodes which is often the case in a material point analysis).
As is evident from the references above, much of the MPM literature has to date focused on various impacts of grid element crossing on the accuracy, and convergence, of the method. However, there are other important issues in terms of accuracy of the MPM that have received relatively little attention/investigation. One is volumetric locking, which is a common difficulty for the numerical analysis of incompressible or nearly incompressible materials, for example, rubber, biological soft tissues, plastically deforming metals and soils. As demonstrated by Figure 2 for FEs, incompressibility will resist volume change of elements and then result in erroneous, overly stiff, global force-displacement predictions. In addition to this over-stiff global behavior, volumetric locking also manifests itself in terms of severe oscillations of the stress field within and/or between elements (checker-boarding), especially in terms of the mean stress. Both of these artifacts are caused by each element trying to satisfy the volumetric constraints placed on it by the element's integration points, and if the number of constraints exceeds the possible deformation modes the element will lock. Several FEM approaches have been published for avoiding volumetric locking and these are discussed in detail in Section 3. However, the issue of locking in the MPM has received very little attention to date. Notable exceptions include: Mast et al. 31 who proposed a Hu-Washizu multifield variational principle-based approach to overcome volumetric and shear locking in MPMs, Chen et al. 32 who adopted a two-field (v-p, velocity-pressure) variational principle to avoid volumetric locking for weakly compressible problems, Jassim et al. 33 who detailed an enhanced volumetric strain approach for explicit MPMs for coupled problems and Coombs et al. 34 who developed an F approach 35 to avoid volumetric locking for the standard and generalized interpolation MPMs for linear quadrilateral and hexahedral elements. However, the approach of Coombs et al. 34 can only be applied to quadrilateral/hexahedral elements and the paper only showed results for structured grids aligned to the global Cartesian coordinates, the method proposed by Mast et al. 31 was only demonstrated for linear material behavior and Chen et al.'s 32 approach was focused on fluids and fluid-elastic structure interaction. In addition to this, Mast et al.'s 31 method significantly increases the size of the linear system and introduces variables that have no physical meaning. Although the approach of Chen et al. 32 only introduces one additional degree of freedom for each background grid node, it also relies on artificial modeling parameters. Finally, Jassim et al.'s 33 approach is restricted to explicit MPMs.
As discussed earlier in this section, having more material points within an element than the number of nodes associated with the element can lead to spurious artifacts other than volumetric locking, specifically the null-space issue investigated in References 29,30. In this paper we do not focus on correcting spurious oscillations caused by the null-space issue, but acknowledge that this issue may be present in our method as we are not fundamentally changing the way that stresses at material points are mapped to internal forces at nodes, at least not for all stress components. It is therefore plausible that nonzero values of physical quantities at material points could result in zero values of the same quantity at nodes after material point to node mapping-this is the null-space problem. It would be interesting to combine the volumetric locking mitigation approach detailed in this paper with the null-space filtering suggested in Reference 30, however this is outside of the scope of this paper.
The motivation for the research presented here stems from the authors' desire to simulate 3D soil-structure interaction problems using the MPM that have no realistic 2D approximations, such as seabed ploughing and the installation of helical (or screw) piles. 36,37 The complex geometry of both seabed ploughs and helical piles makes an unstructured tetrahedral background grid the most sensible choice, especially as it allows local refinement around points of interest/geometric features. To this end, the major contribution in this paper is to tackle volumetric locking in the MPM via an extension of the F method so that it can be applied to the MPM with simplex elements (linear triangles and tetrahedra) which we call the F-patch MPM. A secondary contribution of this paper is to propose strategies to then speed up these complex three-dimensional MPM simulations, including: (i) an efficient algorithm for locating material points in an unstructured mesh and (ii) parallelization of building and solving the system of equations, which is complicated by the nature of the F-patch method as will be explained. Most published MPM simulations use regular grids, in which locating material points is straightforward; however, it is a major challenge to efficiently locate material points in an unstructured mesh, and searching element-by-element, checking whether a point in inside a triangle or tetrahedron can be very time consuming. In this paper we adopt a Walking-in-Triangulation (WiT) 38 approach to locate material points and show this to be very efficient. The two developments produce a variant of the standard MPM that is both accurate and efficient for large 3D modeling.

MATERIAL POINT FORMULATION
In this paper we adopt an implicit solution strategy for quasi-static large deformation elasto-plasticity with a multiplicative decomposition of the deformation gradient 39 and assume a linear relationship between Kirchhoff stresses and logarithmic (or Hencky) strains. This is one of the most straightforward ways to implement large strain elasto-plasticity as it recovers the conventional format of small strain elasto-plasticity constitutive models. For the sake of brevity details of the large deformation framework are not given here, instead the interested reader is referred to References 27,34 for more information. The authors have also recently produced an implicit MPM learning environment AMPLE which is fully described in Reference 19 that uses the same continuum formulation as this paper.

Updated Lagrangian formulation
The majority of material point formulations adopt an updated Lagrangian weak statement of equilibrium * , which, for quasi-static analysis, can be expressed as through the body. The weak form is derived in the current frame assuming a field of admissible virtual displacements, i . The Galerkin form of the weak statement of equilibrium over each background grid element, E, can be obtained from (1) as where [∇ x S vp ] is the the strain-displacement matrix containing derivatives of the shape functions with respect to the updated (or deformed) coordinates. The first term in (2) is the internal force within an element and the combination of the second (body forces) and third (tractions) terms is the external force vector. (2) is nonlinear in terms of the unknown nodal displacements and in this paper we solve this non-linear equation using an implicit Newton-Raphson (NR) procedure. This requires (2) to be linearized with respect to the nodal displacements to arrive at the stiffness of each element in the background mesh where a ijkl = ( ij ∕ F km )F lm is the spatial consistent tangent modulus for a point within the element (see Reference 40, appendix A.1). In MPMs we approximate this element integral through a finite sum over a number of material points which track an associated amount of volume as they deform through the background mesh. The stiffness contribution of each material point can be expressed as where v p = det ( F ij ) V p is the volume associated with the material point in the spatial (updated) frame, F ij is the deformation gradient and V p is the volume associated with the material point at the start of the analysis. The internal force contribution of a single material point to the background mesh is In order to advance the nonlinear analysis the applied external loads are split into a number of load (or pseudo time) steps. Within each of these steps equilibrium is sought between the applied loads and the internal forces. The incremental deformation gradient within each step can be calculated using 27,34 where Δu i is the displacement increment within the current load step,X j are the coordinates at the start of the load step and n is the number of nodes that influence the material point. This allows the increment in the deformation gradient to be obtained from derivatives of the basis functions based on the coordinates of the nodes at the start of the load step. The spatial derivatives of the basis functions can subsequently be calculated using techniques in Charlton et al. 27 It is these spatial derivatives that are used in the strain-displacement matrix, [∇ x S vp ], when evaluating both the internal force and stiffness contribution of a material point. Once equilibrium has been found, the positions of the material points can be updated using

External forces and boundary conditions
The application of boundary conditions in the MPM is one of the method's primary diversions from the FEM as the boundary of the physical domain will, in general, no longer coincide with the background mesh. This has resulted in a number of papers focusing on the tracking of boundaries and the application of boundary conditions in the MPM. 37,[41][42][43][44] In this paper we adopt the moving mesh approach of Beuth 45 to adjust the background mesh so that it coincides with the imposed Dirichlet (displacement) boundary conditions. Inhomogeneous Neumann boundary conditions (nonzero tractions) are not considered in this paper but their application would not require any changes to the methods and techniques proposed here.

Basis functions
In this paper we adopt the standard MPM, with a background mesh comprising linear four-noded tetrahedra. Although more advanced domain-based formulations are available, we adopt the standard MPM as the focus of this paper is on overcoming volumetric locking for low order unstructured meshes rather than on addressing element (or cell)-crossing instabilities. The choice of the standard MPM here could be viewed as somewhat retrograde given the availability of more advanced domain-based MPMs as mentioned above. However, it is now becoming clear that for some of these more advanced methods, shortcomings exist for certain forms of deformation. In addition to this, many domain-based formulations, such as GIMP, are not applicable to unstructured grids and therefore cannot be applied in this case. A recent detailed study of these limitations appears in Reference 46.

VOLUMETRIC LOCKING
As commented on in Coombs et al., 34 the MPM is susceptible to volumetric locking due to the combination of lower-order elements, which are adopted for stability reasons, and a large number of material points in order to reduce the integration inaccuracies associated with nonoptimum placement of integration points. When we add to these near-incompressible material behavior, such as in isochoric elasto-plasticity, the problems of volumetric locking and checkerboarding of the mean stress become severe. It should be noted that the use of a large number of material points does not itself lead to the occurrence of locking. Since the calculations in the standard MPM are fundamentally FE-based, the same rules for assessing locking likelihood, such as constraint counting can be used, 47 where the constraints come from the physical behavior of the integration/material points) and comparing this to the degrees of freedom of the mesh. Consider a mesh comprised of m linear four-noded tetrahedral elements (as used here) and n nodes (3n displacement degrees of freedom). Linear tetrahedra are constant strain elements and therefore they can only represent a constant stress or strain field; they only allow a constant change in volume over the element. If the material behavior within the elements is nearly incompressible (via a moderately large Poisson's ratio or isochoric plasticity) this will impose m constraints on the mesh, compared to 3n degrees of freedom. However, in tetrahedral meshes the ratio of the number of elements to the number of nodes is typically more than 5 to 1 (see p. 444 of Reference 48), imposing at least 5n constraints on a mesh with 3n degrees of freedom. This means that the element will lock irrespective of the number of integration (or material) points used to integrate the element. The element would still lock even if a single material point was used.

Locking mitigation approaches
In the FEM literature, several approaches have been proposed for avoiding volumetric locking with low-order elements, including: (a) reduced integration with quadratic or higher-order elements, for example, see Reference 49, where the order of quadrature is reduced in order to reduce the volumetric constraints placed on the element. However, care must be taken to avoid the introduction of spurious zero-energy modes and for the MPM this approach is not feasible as we do not know a priori the number of material points that will be in each element; (b) selective reduced integration 50 where the volumetric part of the stiffness and internal force is integrated using lower order quadrature compared to the rest of the deformation over the element. This approach has the advantage that it avoids the introduction of zero energy modes associated with shear deformation (such as hourglassing) while relaxing the volumetric constraint placed on the element. However, as with the above reduced integration approach, in the MPM we do not know the number and/or locations of the material points within an element a priori † ; (c) introduction of an additional pressure-volume energy-conjugate field in the variational principle to produce mixed elements, for example, References 51-53. Although feasible for the MPM, these approaches introduce an additional global unknown at each point/node, increasing the computational cost of assembly and solution of the global linear system; (d) enhanced assumed strain elements [54][55][56] which have the advantage that they can overcome both volumetric and shear locking of low-order elements through adoption of an alternative variational principle. The resulting element formulation can be expressed in terms of conventional nodal displacements (plus internal element degrees of freedom to allow displacement discontinuities between elements) as the other fields (stress and gradient of the displacement field) can be condensed out of the formulation. However, the element formulations are considerably more involved than a standard element formulation and require the inversion of a linear system at an element level to solve for the internal degrees of freedom. In addition to this, care must be taken to avoid hourglass instabilities within the formulation, especially under large strain analysis; 57-59 (e) geometrically nonlinear B-bar approaches 60 which directly modify the strain-displacement matrix of the element (effectively modifying the strain-energy function of the underlying material) using this modified matrix to determine the internal force and stiffness of the elements; (f) for an unstructured mesh with linear triangles or tetrahedra the nodal mixed discretization technique 61 (based on earlier work in Reference 62) has been successfully used to mitigate the influence of volumetric locking. This approach relies on different approximations for the volumetric and deviatoric components of the strain tensor, where the volumetric strain is determined via a two step process: (i) nodal values of volumetric strain are determined by a volume average of the volumetric strain in the elements attached to the node under consideration; (ii) element volumetric strain values obtained by taking the mean of the nodal values associated with the element. For problems involving dilative/contractive elasto-plasticity, a hydrostatic stress averaging and plastic stress correction process is also applied via: (i) obtaining volume average values at nodes, and (ii) averaging nodal values back to elements. The approach has been applied to the MPM, see for example References 15,33,45,63. However, this is fundamentally an incrementally small strain procedure and the generalization of the process to large deformation mechanics, and large deformations within a load/time step, is not clear. In addition to this, the linearization of this multistage, multimapping process, essential for proper convergence of implicit methods, is problematic. (g) F approaches which modify the deformation gradient assuming a different variation of the deviatoric and volumetric components of the deformation gradient across an element. For example, for the four-noded bilinear quadrilateral F element it is assumed that the volumetric component of the deformation gradient is constant over the element whereas the deviatoric component varies bilinearly according to the shape functions of the element. 35 While this approach was successful, the same technique cannot be applied to linear triangles and tetrahedra which later motivated the same authors to propose the F-patch method where the volumetric component of the deformation gradient is constant over a patch of elements. 64 In this paper we follow the approach of Coombs et al. 34 and adopt an F approach for the following reasons: (i) no additional global or local unknowns are introduced; (ii) it is straightforward to implement; (iii) it does not modify the stress-strain constitutive formulation, thereby not placing restrictions on the constitutive formulation and (iv) no additional empirical parameters are introduced. However, this paper is concerned with simplex elements and therefore adopts the F-patch method of de Souza Neto et al. 64 † One potential application of selective reduced integration could be within the improved MPM approach of Sulsky and Gong 23 where conventional Gauss quadrature is used to integrate the MLS-based stress field, however this is not the focus of the current work.

F-patch formulation
Following Reference 64, the essential idea of the F-patch method is to modify the deformation gradient for an element using data from a group of elements in a surrounding "patch". For a material point within a patch , the incremental deformation gradient is modified to where ΔF ij is the incremental deformation gradient obtained from (6), n D is the number of physical dimensions and v patch andṼ patch are the volume of the patch ‡ at the deformed state and the start of the load step, respectively. The deformation gradient of the deformed material point can then be obtained using the deformation gradient from the start of the load step, F n ij , using F ij = ΔF ik F n kj . As a result, the computation of the internal force and element tangent stiffness are also modified. For the internal force calculation, it is necessary to replace F ij by F ij when calculating the appropriate stress measure. Another important implication of the F-patch method is that a material point contributes stiffness to all of the elements in its patch, increasing the bandwidth of the resulting linear system of equations.
The stiffness contribution of a material point to its parent element, e, is § where v e is the volume of the element and, in three dimensions, [q] is the 9 by 9 matrix form of The stiffness contribution of a material point to the other elements, s, within its patch is where [∇ x S s vp ] is the strain displacement matrix of the element within the material point's patch, but not its parent element, e, and v s is the deformed volume of this element. The superscripts on [k es p ] denote that the material point contributes to the global stiffness matrix row of its parent element and the columns of the other elements within the patch. This point has important consequences for efficient parallelization of the method and will be discussed in more detail in Section 4.3. A method to specify patches within an analysis is explained in Section 4.1.

NUMERICAL IMPLEMENTATION
This section details some of the key aspects of the numerical implementation of the MPM described in the preceding sections.

Patch definition
Patches need to be defined that include sufficient elements to mitigate volumetric locking while limiting numbers to avoid spurious zero energy modes. 64 The procedure is as follows: 1 This process is shown schematically in 2D in Figure 3. A six-node triangular mesh is first generated (shown by the red and pink patches in Figure 3(A)), each element is then subdivided into four 3-noded triangular elements. In 3D, a 10-node tetrahedral mesh is first generated, and then each element is subdivided into eight linear tetrahedral elements ¶ , as shown in Figure 3(C). Unlike in the FEM where the elements in a patch stay the same throughout the analysis, in the MPM patches need to be updated due to activation and deactivation of elements as material points move in and out. This is shown in Figure 3(B), where the material points (in blue) have displaced such that five elements no longer contain material points. This means that in the current computation, the patches at the top left and right each comprise three elements and the patch at the top middle comprises one element.

Material point search algorithm
The second key aspect of implementation is the determination of the parent element, that is, the element in which the material point is located, at the beginning of each load step. For a regular grid, this can be directly computed based on the coordinates of material points and the grid size, but this is not the case for an unstructured mesh and a robust search algorithm is required. This problem is referred to in computational geometry as the "point location problem" (e.g., see chapter 6 of de Berg et al. 65 ) and it is recognized that this problem is still not fully resolved when considering more than two dimensions. For this 3D case, Devillers et al. 38 propose several alternative strategies by "walking" element by element and the method adopted here is similar to the most advantageous of these strategies. This problem is also referred to in numerical computation as the "particle-localization problem," for example, Reference 66. A similar approach to the ideas presented below for the MPM can also be found in Reference 67.
At the beginning of a simulation, the material points are generated at Gauss point locations in each element and therefore at this stage the parent element of each material point is known. In the following steps of an analysis, location is determined by choosing candidate elements and then checking if the point is inside each candidate. For the second of these checks, one approach is to use a signed volume check, that is, the point is inside an element only if the signs of volumes of all four new tetrahedra, in which one vertex is replaced by the point, are the same as the sign of the original element. This can be demonstrated via a 2D example in Figure 4: The sign of A 1q3 is different to that of the original element A 123 , therefore the point q is outside of the element.
This algorithm is reliable but the cost of computing four matrix determinants at each check is high. A 2D precursor to the MPM (the FLIP method) used the natural (or local) coordinates of material points with respect to an element under the isoparametric mapping for location purposes and this has been adapted to provide a cheaper method here. The signs of local coordinates, and their combination, can be used to determine the relative location of a point with respect to the element, as demonstrated in Figure 4(C), using the 2D case for clarity. With the mapping into isoparametric space, we can determine the local coordinates, ( , ), of any point, (x, y), with respect to an element by (13).
where (x i , y i ), i = 1, 2, 3 are the coordinates of the vertices. The Equation (13) is derived from where Then, as shown in Figure 4 This method has several advantages: notably, [T] in (13) only depends on the coordinates of the vertices, so it can be reused to check other material points and for efficiency, [T] −1 can be computed with an explicit formulation based on Cramer's rule and then stored for each element. Once a point is located inside an element, its local coordinates can be reused to compute the basis functions for later computations in the MPM, for example, mapping displacements from nodes to material points. Another advantage of this method is that we can get the relative position of the point with respect to the element, which provides a search direction for the other check, that is, selecting candidate elements, as explained below.
For the MPM, while large deformations can be modeled in an overall analysis, step-to-step movements are likely to be small and only elements in the vicinity of the previous parent element should require checking as candidates for the new location of the point. A major improvement in efficiency can be gained by only checking elements in the direction indicated by the local coordinates check outlined above. This method, similar to the WiT procedure, 38 is demonstrated in Figure 5 for 2D. Assume the coordinates of a material point are updated to {x} n+1 from {x} n at the end of the nth step. At the beginning of the (n + 1)th step, to find the parent element of the point, we first select the previous parent element as the candidate, and find the edge, which separates the point from the element (the blue line in Figure 5). Hence, the neighbour element with respect to this edge become the next candidate. The local coordinate check is then carried out and if necessary the procedure repeated until condition IV is met. This procedure is given in Algorithm 1. The success of this algorithm depends on efficiently obtaining all neighbours of an element. In the MPM used here, this neighbors list is built only once at the beginning of the simulation, because it is assumed that the connectivity of the mesh is not changed # . A tetrahedral element has four neighbors, one associated with each face. Assuming there are n tetrahedrons in a mesh, then a matrix of dimension n × 4, [NE], can be used to store the neighbours list. If a face is a boundary, its neighbour element ID is set to be −1. The algorithm to build [NE] is given in Algorithm 2 ( Figure 6).

Parallelization
Having explained the F-patch method and the searching algorithm we now focus on the final component to enable large scale analysis, parallelization. The method has been developed using routines in the Portable, Extensible Toolkit for Scientific Computation (PETSc 68 ) notably those for solution of the system of nonlinear equations in parallel, which work by distributing vectors, for example, the out-of-balance force { f oobf } , and matrices, for example, [K], to separate processes for the solution of the linear system in parallel. For standard finite elements, the distribution is achieved by partitioning the mesh using well-established algorithms that balance load on processes (e.g., METIS 69 ). These partitions are not, # Note that the element neighborhood list could be reconstructed if the original mesh was replaced. however, necessarily formed of geometrically contiguous collections of elements. So, the issue we meet here is the need to ensure patches of elements that are contiguous for the F-patch method and not split across processes. The solution is to constrain the partitioning algorithm to operate on patches, instead of individual elements. In that way, we ensure that the stiffness associated with an element has optimum access to information about all other elements in the same patch, as required for the F-patch method, specifically for efficient calculation of Equation (12). For the 2D case, for clarity, the patch decomposition is shown in Figure 7(A) over three processes, and Figure 7(B) shows the further subdivision of patches into elements.

NUMERICAL EXAMPLES
This section presents several numerical examples to show the performance of the parallel F-patch MPM, including: (1) footing problems with small and large deformations to demonstrate that volumetric locking is avoided and stress oscillation reduced; (2) a simple stretch problem with material points crossing element edges, where anticipated stress oscillations are reduced, indicating that suppressing volumetric locking may be more important than element-crossing; (3) double-notched plate and vane shear problems to provide evidence that volumetric locking is avoided and stress oscillation reduced, and (4) where y is the yield strength of the material and = √ 2J 2 , J 2 = 1 2 s ij s ij is the second principal invariant of the stress deviator tensor, s ij = ij − ij kk ∕3 and ij is the Kirchhoff stress. In order to apply the Dirichlet boundary conditions, the moving mesh strategy 36,45 is used. Many of the examples presented in this section use 11 material points per initially populated background element, which may seem a large number in terms of integrating over a linear tetrahedral element. This number of material points is adopted to reduce the oscillations caused by cell-crossing instabilities when analysing large deformation problems, as the volume associated with each material point, and therefore the associated jump in internal force, reduces with increasing numbers of material points. The impact of changing the number of material points is explored as part of Section 5.5.

Footing with small deformation
The first example is a 3D small deformation footing problem which was also studied in Reference 27. Because of symmetry, only a quarter of the model is considered, as shown in Figure 8 The magnitude of the vertical reaction force is plotted against displacement in Figure 8(C) showing that the force erroneously increases with the standard MPM after the material has yielded. This error is due to volumetric locking, and is avoided by the use of the F-patch MPM. In addition, we find that stress oscillation is significantly reduced with the F-patch MPM, as shown in Figure 9.

Large deformation elastic indentation
The same physical domain as in Figure 8(A) is considered in this section, with the same discretization and number of material points per element, but with large deformations and elastic material behavior. Since we have demonstrated the locking avoidance and stress oscillation reduction for an elasto-plastic material in the previous example, this example demonstrates the performance for the case of nearly incompressible elastic material with E = 10 GPa and = 0.48. As shown in Figure 10, an extended empty mesh or moving mesh is included on the top of physical domain, which is used for handling material points during simulation with large deformations. The boundary condition includes rollers on all vertical and bottom faces, and the displacement boundary condition is applied on the footing area via the moving mesh strategy so that the extended mesh moves with the footing area but the other part of mesh is compressed. Therefore, we always explicitly have the footing area in the mesh, where the displacement boundary condition is directly applied.
In the simulation, an automatic step size strategy is used with the initial incremental displacement, Δu = 0.1 m. With this strategy, Δu is doubled in the next step if the current step at this increment size successfully converges, otherwise the step is repeated with half Δu. Figure 11, shows the large deformations modeled at the footing and a representative stress field is shown for both the standard and F-patch MPMs once again illustrating the change in oscillatory stress behavior. The expected stress concentration at the inner corner of the footing area is obtained with the F-patch MPM as observed in Figure 11(B). The reaction force in the direction normal to the footing area is plotted against the applied vertical displacement in Figure 11(C). The variable interval between markers shows the variable step size used. The response using the F-patch MPM is smoother than that with the standard MPM, due to the differences in the stress fields. The F-patch MPM predicts a lower force than the standard MPM because of the removal of volumetric locking, although the difference in terms of the reaction force between the two methods is less significant than the previous problem due to the material having a less onerous constraint on the volumetric behavior of the material. A more significant difference can be seen the stress fields shown in Figure 11

Simple stretch
In order to further confirm the observation that the F-patch MPM eliminates stress oscillations in solutions, a third example is presented, that is, the uniaxial stretch of a cuboid domain. The initial edge length of the domain is set to 2 m, and the material parameters are E = 10 3 Pa, = 0.0 and y = 400 Pa. The domain was discretized into five 10-noded tetrahedral elements, that is, five patches for the F-patch computation, which were then subdivided into 40 linear tetrahedra (eight per initial 10-noded tetrahedron). Each element was initially populated with 11 material points per element. Roller boundary conditions were applied on three surfaces, x = 0, y = 0, z = 0 , and an incremental displacement of Δu = 0.4 m per load step was applied on the surface, y = 2 m, as shown in Figure 12. The moving mesh strategy used here simply stretched the mesh along the y-axis direction such that the boundary of the physical domain aligned with the mesh. The mesh and material point distribution at a representative point in the analysis (the 17th load step) is shown in Figure 12(B), at which point the y = 2 m boundary has been displaced to y = 8.8 m. The Kirchhoff stress component, yy , at all material points is plotted against the load step in Figure 13. The average value agrees with the analytical solution but oscillations clearly exist within the domain, as shown by the red-and yellow-shaded regions that represent the range and first quartile of the stress values at the material points, respectively. These oscillations are more pronounced in the results from the standard MPM and are much reduced by using the F-patch approach. This occurs because the stress state of a material point is influenced by other elements in the same patch. This reduces the sudden transfer of internal force between nodes as a material point crosses from one background element to another (provided that they are in the same patch).
In order to explore the impact of element aspect ratio on the the performance of the F-patch approach, the simple stretch problem was reanalysed with increasing numbers of element subdivisions of the background mesh in the x and z-directions but keeping two element subdivisions in the y-direction, as shown in Figure 14. In all cases the elements were initially populated with 11 material points. The same boundary conditions described previously were used with displacement of 2 m applied over 5 load steps. Table 1 presents the normalized stress error for the meshes shown in Figure 14, where V = ∑n mp p=1 V p is the initial volume of the problem (8 m 3 in this case), V p is the initial volume associated with each material point and a = 400 Pa is the analytical normal stress solution in the y-direction for a displacement of 2 m. Despite the stress oscillations caused by cell crossing after the onset of elasto-plasticity, the error converges with mesh refinement for both the standard and F-patch MPMs. The errors for the F-patch MPM are consistently around 70% of the errors for the standard MPM, irrespective of the aspect ratio of the elements. This demonstrates that the F-patch approach is relatively insensitive to the aspect ratio of the background elements.

Double-notched plate
The stretch of a double notched plate is the next example, as studied in many other papers, for example, Reference 34. Because of symmetry, only one-eighth of the problem was considered. The geometry is shown in Figure 15. Different meshes with element edge sizes h = 2, 1, 0.5 mm were used, with 11 material points per element at the start of the analysis. The material parameters used were Young's modulus, E = 206.9 GPa, Poisson's ratio, = 0.29 and yield strength, y = 0.55 GPa. A Displacement was applied to the top edge of the plate with an incremental value 0.01 mm per load step. The moving mesh strategy was adopted such that the mesh was simply stretched with the deformation of the top surface of the plate. The convergence in the NR iterations of the F-patch MPM is of particular interest here. Figure 16 shows the magnitude of the out of balance force for iterations in each step. Each load step converges in four to five iterations, showing a satisfactory convergence behavior. The magnitude of the reaction force in the stretch direction is plotted against the displacement in Figure 15(C). Similarly to results in Reference 34, the reaction force reduces as the mesh is refined and values are larger with standard MPM than with the F-patch MPM, due again to over-stiffness associated with volumetric locking. It is interesting to observe that the difference in the maximum reaction force of the standard and F-patch MPMs seems to be reducing with mesh refinement. This is contrary to the literature associated with the development of the F patch method for finite element analysis. For example, de Souza Neto et al. 64 explored this point via the analysis of a nearly incompressible elastic Cook's membrane with standard and F-patch linear triangles. Although the paper does not present the results in terms of convergence rates, analysis of their data shows that the F-patch approach maintains the convergence rate of the underlying parent element (a convergence rate close to 2 in terms of displacement error for a linear triangle), whereas the convergence rate of the standard element reduces to ≈0. 16. Therefore the effectiveness of the F-patch approach increases with mesh refinement in terms of normalized displacement errors, at least for the nonlinear elastic problem examined by Reference 64. However, the results in this section are for an elasto-plastic problem, where the nearly incompressible material behavior is only associated with the part of the domain undergoing elasto-plastic deformation. The double-notched plate includes stress concentrations associated with the partially constrained lower boundary (the edge of the notches), which will lead to localized elasto-plastic regions. As we do not consider higher-order continuum theories, the size of these regions, and therefore the size of the nearly incompressible regions, will reduce with mesh refinement and this may explain the difference between the results presented in this section (which are consistent with those presented by Coombs et al. 34 ) and the nonlinear elastic results of de Souza Neto et al. 64 The magnitude of Cauchy stress on material points in the deformed configuration in Figure 17 shows that the stress oscillation with the F-patch MPM is lower than with the standard MPM.

Rotation of vane slice
The shear vane is a standard geotechnical test to obtain an undrained soil strength. In the test, the vane, as shown in Figure 18(A), is pushed into the ground, rotated and the torque recorded. The experimental data of applied torque and rotational angle can be used to define the soil strength with an empirical equation providing some simple assumptions are made as to the failure mode, for example, a cylindrical shear surface. This test has been numerically simulated with the FEM as a 2D plane strain problem in Reference 49 and here we present the results of simulations for a "slice" of the vane, which is similar to the 2D problem in Reference 49. The geometry and mesh for this simulation is shown in Figure 18 The summation of the torque at all nodes on the vane blade surfaces against vane rotation angle is plotted in Figure 18(C). A varying number of material points per element were used (1, 4, and 11, etc.) but the torque rotation plots were the same. The torque with F-patch MPM plateaus after the soil yields but not with the sMPM, showing the effect of volumetric locking. There is a difference between the torque with F-patch MPM and that estimated empirically in Reference 49 because the real shear stress failure surface is not cylindrical as shown in Figure 19.

Rotation of the full vane
This section presents a simulation of a shear vane test with the full geometry. As shown in Figure 20 The torque is plotted against the vane rotation angle in Figure 20(C) showing that the F-patch MPM appears to remove the volumetric locking. Comparison of the torque here to that for the simulation of the vane slice shows that the torque is larger and also further from the value estimated empirically. This is because the full vane is simulated here, and the torque has contributions from both blade surface resistances and blade edges at the embedded end. The significant difference suggests that real 3D simulation is needed for accurate modeling of this test.

Efficiency
Having demonstrated the accuracy of the F-patch MPM we now look at efficiency through analysis of simulation times. All results presented here were obtained on a workstation with two physical CPU chips (Intel Xeon E5-2630 v3 @ 2.40GHz), with eight cores per chip, giving 16 processors in total. The time for sequentially simulating rotation of the full vane in Section 5.6 is shown in Figure 21. The Figure 21(A) shows the simulation time and number of iterations for each load step. The first load step for an elastic deformation with geometric nonlinearity needs three iterations, and after the fourth load step five iterations are needed as the material yields. This indicates that the simulation time is proportional to the number of iterations. To confirm this, the time for one iteration is computed in each load step and plotted in Figure 21(B), where the mean of these times over all 10 load steps is also plotted. The time for one iteration is almost the same for different load steps, with less than 2% difference to the mean. This indicates that the total simulation time can be estimated as the number of iterations multiplied by the time for one iteration. Therefore, a simple example with one iteration is firstly used to perform the efficiency analysis, followed by performance analysis for simulations of the examples above in Sections 5.2-5.6. The one iteration example is the analysis of an elastic cube subjected to simple stretch. As shown in Figure 22, a cuboid domain with initial edge length 2 m was discretized by a mesh with 6356 nodes and 31,328 elements, with 1 material point per element. The simulation time for one load step with one iteration, given in Figure 22 shows that about 73% of simulation time is used for locating material points without using the WiT algorithm and with 1 processor, indicated by nW-P1. However, this time is significantly reduced with the WiT algorithm, indicated by W-P1, from 169.37 to 0.092 s. To further improve simulation speed, multiple processors were used for computation of the tangential stiffness matrix or Jacobian, [K], and out-of-balance force vector, {f oobf }. With two processors, indicated by W-P2, the total simulation time is larger than that with one processor. This surprising result occurs because some time is needed for communication between processors. The ratio between cost of communication and of computation is high, because only one material point per element is used. With more processors, see W-P4 and W-P8 in Figure Ideally, the value of speedup ratio is N, the number of processors. As shown in Figure 21, the total simulation time is proportion to the number of iterations and independent of the load step. The simulation time for the first load step is used to compute the speedup ratio. Figure 23(A) shows the speedup ratio for the double-notch plate problem with different mesh refinement in Section 5.4, (b) for the footing with large deformation in Section 5.2, the rotation of the vane slice in Section 5.5 and the full vane in Section 5.6, (c) for the rotation of vane slice with different numbers of material points per element. Figure 23(A) shows that the speedup ratio is almost ideal for all simulations with different mesh refinements. Figure 23(B) shows that the speedup ratios for the footing problem are close to ideal values. However, the simulation time with two processors is more than with one processor for the rotation of vane, which is similar to the observation supported by Figure 22. This is because one material point per element is used in these simulations, and 11 material points per element in others. Recall that the decomposition of the computation is based on a patch mesh, which determines the cost of communication between processors. As the number of material points per element increases, the cost of computation increases. Therefore, as the ratio between the cost of computation and that of the communication increases, the speedup ratio will approximate to the ideal values, a conclusion supported by Figure 23(C), in which the mesh is fixed but material points per element varies with values 1, 4, and 11. The F-patch approach will increase the cost of a material point analysis compared to the standard MPM and the major contribution to this difference is in the assembly of the linear system of equations based on information at the material points. This is due to the fact that for the F-patch approach each material point not only contributes stiffness to its parent element but also to all of the elements within its parent element's patch. For the way that patches are created in our method, this means a maximum of eight times the number of stiffness entries are generated for each material point -12 2 terms for the parent element and 7 × 12 2 terms for all of the other elements in the patch. In terms of run times this approximately doubles || the cost of the global stiffness calculation compared to the standard MPM. As the run time of the stiffness assembly scales linearly with the number of material points, the approximately doubling of the run time increase will remain constant for different analyses. Although a doubling of the run time for the stiffness calculation is significant, the trade off is meaningful results that are not possible with the standard MPM. There will also be an increase in the solver time due to the additional connectivity within the global stiffness matrix, however this is insignificant compared to the additional assembly cost.

CONCLUSIONS
The MPM is acknowledged to be an exciting development in the analysis of large deformation solid mechanics problems however for more widespread adoption for realistic problems, developments are needed to deal with issues such || 2.05 × of the run time for the considered test case.
as cell-crossing errors, null-space deficiencies and volumetric locking. In addition, the method needs to be easy to use when the background mesh is large, unstructured and 3D. In this paper our goal has been the development of such a method. The key novelties are: (1) we have shown that the oscillation of the standard MPM with simplex elements is also due to volumetric locking rather than solely cell-crossing error (the stress fields have been converted from random garbage to physically reasonable results for all of the problems considered in this paper); (2) we have adapted an effective existing idea, the F-patch method, for solving the volumetric locking issue in the MPM; (3) we have developed a novel approach for improving the computational efficiency of the material point-element search. For a typical problem an 1800 times speed up was achieved compared to a naive searching approach, and (4) a demonstrably effective parallel strategy for this proposed F-patch MPM is demonstrated. A number of example problems have been presented to demonstrate the accuracy of the proposed MPM and studies of simulation times have demonstrated good performance in parallel.