Robust interpolation for dispersed gas-droplet flows using statistical learning with the Fully Lagrangian Approach

A novel methodology is presented for reconstructing the Eulerian number density field of dispersed gas-droplet flows modelled using the Fully Lagrangian Approach (FLA). In this work, the nonparametric framework of kernel regression is used to accumulate the FLA number density contributions of individual droplets in accordance with the spatial structure of the dispersed phase. The high variation which is observed in the droplet number density field for unsteady flows is accounted for by using the Eulerian-Lagrangian transformation tensor, which is central to the FLA, to specify the size and shape of the kernel associated with each droplet. This procedure enables a high level of structural detail to be retained, and it is demonstrated that far fewer droplets have to be tracked in order to reconstruct a faithful Eulerian representation of the dispersed phase. Furthermore, the kernel regression procedure is easily extended to higher dimensions, and inclusion of the droplet radius within the phase space description using the generalised Fully Lagrangian Approach (gFLA) additionally enables statistics of the droplet size distribution to be determined for polydisperse flows. The developed methodology is applied to a range of one-dimensional and two-dimensional steady-state and transient flows, for both monodisperse and polydisperse droplets, and it is shown that kernel regression performs well across this variety of cases. A comparison is made against conventional direct trajectory methods to determine the saving in computational expense which can be gained, and it is found that $10^3$ times fewer droplet realisations are needed to reconstruct a qualitatively similar representation of the number density field.


INTRODUCTION
Gas-droplets flows are ubiquitous within science and engineering, and are central to many industrial applications and environmental phenomena, including facilitating the decarbonisation of propulsion systems for transportation 1 , determining the spread of airborne droplets containing a pathogen 2 , and modelling droplet condensation within atmospheric clouds 3 . Of primary interest in the characterisation and description of such systems is the variation of droplet number density throughout the flows, with this being a key determinant in physical factors such as reaction rates and probability of infection. The mechanisms that are responsible for the variation in droplet number density have received much attention, with studies considering both the influence of the carrier flow structures and droplet physics upon the clustering and dispersion of droplets.
Whilst experimental studies continue to be an important informant of droplet behaviour through the provision of quantities such as the distribution of droplet velocities and sizes, it is useful to be able to study specific flow configurations without having to use an experimental setup. Mathematical and numerical modelling is able to fulfil this role with the aid of computational simulations, and provides a wealth of detailed information that can be difficult to obtain experimentally, such as knowledge of the instantaneous droplet number density field. Such modelling is able to provide detailed insight into the physical phenomena which occur in droplet-laden flows, and constitutes a topic of growing research interest.
Within the realm of modelling gas-droplet flows two distinct approaches exist for handling the droplet phase. The first treats the droplets as a continuum in a similar manner to the gas carrier flow, and solves a set of coupled partial differential equations which govern the evolution of the droplet phase field variables, specifically the number density, mean velocity, and kinetic stresses. By its nature, the resolution of this approach is limited by the fidelity of the computational grid upon which the equations are solved, and consequently the resultant Eulerian description of the droplet phase is restricted in the level of detail that can be represented 4 . In contrast, droplets can instead be treated individually, with the equation of motion for each droplet being solved along their separate Lagrangian trajectories. This method is able to capture a considerable level of detail, with the behaviour of droplets in different regions of the flow being accurately accounted for through their individual treatment, however the drawback to the Lagrangian approach lies in construction of the Eulerian droplet number density field from the trajectories. In order to obtain statistical convergence of the resulting Eulerian description, the number of droplets required is often prohibitive, being of the order of (10 3 ) in every grid cell for a well resolved flow 5 .
A point of contention between the Eulerian and Lagrangian descriptions of the droplet phase is in the capturing of important physical phenomena such as the crossing trajectories effect 6 , in which the non-negligible inertia of droplets can cause their trajectories to intersect in physical space. This has the consequence of the droplet phase field variables becoming multi-valued; a consideration that is important to account for in the modelling approach. Such instances of multi-valuedness can be observed in the structure of the dispersed phase continuum where they manifest as the envelope of intersecting trajectories, which is referred to as a fold. Eulerian descriptions of the droplet field are by construction single-valued at a given point in space, and therefore unable to include the full detail of the crossing trajectories effect in their standard form. To address this, studies have sought to account for this behaviour directly within the Eulerian representation of the droplet phase field variables by means of developments such as a family of moment methods 4 and multi-fluid models 7 . In contrast, the Lagrangian description of droplets along their trajectories is naturally able to handle the intersection of trajectories, and therefore also the ensuing multi-valued nature of the droplet field, making it an ideal candidate for the development of further models.
To utilise the detail intrinsic to the Lagrangian description and address the issue of needing to simulate prohibitively high numbers of droplets, a body of work has been developed upon the assumption that the droplet phase can also be treated as a continuum with no self-stresses 8 . This has been pioneered by Osiptsov 9 , and involves tracking the number density along individual trajectories by utilising the conservation of mass in Lagrangian form, a procedure known as the Fully Lagrangian Approach (FLA) (some works refer to this as the Fully Lagrangian Method, or Osiptsov's Method). A number of subsequent studies have built upon this foundation, including its application to a gas flow through a turbine 5 , examination of the moments of droplet number density 10 , comparison to experimental spray data 11 , applicability to turbulent flows 12 , inclusion of momentum coupling effects 13 , and most recently, extension to the consideration of polydisperse flows 14 . Despite these advances, the FLA remains a developing concept which presents a variety of challenges within its use, chief of which is the accurate reconstruction of the droplet number density field from trajectory data.
In this paper, the use of a statistical learning approach is proposed to accumulate the contributions from individual droplets, and reproduce the Eulerian number density field in a robust manner which is applicable across a range of different flows. To accomplish this, kernel regression is used, with the averaged nature of this procedure enabling it to account for the multi-valued nature of the contributions along folds, and thereby construct a single-valued representation of the number density field. This extends the linear interpolation approach used in a previous study 14 , and makes accurate representation of the dispersed phase-field possible for a wider class of flows. The developed methodology is applied to a range of one-dimensional and two-dimensional steady-state and transient flows, for both monodisperse and polydisperse droplets, and it is shown that kernel regression performs well across this variety of cases.
The remainder of the paper is structured as follows. Section 2 outlines the formulation of the Fully Lagrangian Approach, and details the assumptions involved. Section 3 introduces the concept of kernel regression, how it can be applied to the calculation of number density, and how the information provided by the Fully Lagrangian approach can be used to ensure contributions from trajectories are accumulated in a physically consistent manner. It is further demonstrated how the procedure can be generalised into a higher-order phase space that includes droplet radius to enable modelling of polydisperse flows. Additionally, aspects of the computational implementation are detailed. Section 4 describes application of the kernel regression approach to a wide class of flow configurations and presents the numerical results. Section 5 appraises the observed simulation results, and discusses the potential of the procedure for application to more complex flows in terms of computational efficiency.

General formulation for polydisperse droplets
Representation of the dispersed phase in a Lagrangian sense is considered through the use of an appropriate governing equation for individual droplets or particles. Henceforth, in this paper droplets are taken to be liquid and can be subject to evaporation, whilst particles are assumed to be solid and non-evaporating; the more general case of droplets is usually focused upon. In this work, droplets are considered to be spherical, within a dilute suspension, and are also assumed to act as point masses. Under these conditions, droplets with position ( ), velocity ( ) and radius ( ) are modelled individually along trajectories according to the arbitrary governing laws and initial conditions at time 0 where the subscript denotes variables that are sampled along the Lagrangian droplet trajectories, ( , , , ) is the force per unit mass acting upon a droplet, and ( , , , ) is the rate of radial change for a droplet with Eulerian position , velocity , and radius at time , and droplets have initial position 0 , velocity 0 , and radius 0 at time 0 . Without loss of generality, both and are assumed to be dependent upon all of , , , and , although may also be dependent upon other thermodynamic parameters which are related to the droplet evaporation.
The FLA is based on the assumption that the dispersed phase can be represented as a continuum 9 , with trajectories characterised by their initial position 0 and time . Recent work has generalised this concept by extending the phase space to include the dependence on initial droplet radius 0 , referred to as the generalised Fully Lagrangian Approach (gFLA) 14 . Then the dispersed phase probability density ( , , ) can be interpreted as the number density in this wider phase space, and is calculated directly along the trajectories of individual droplets by considering the Lagrangian form of conservation of mass 9 ( , , ) = ( 0 , 0 , 0 ) |det( ( 0 , 0 , ))| , where the Jacobian tensor ( 0 , 0 , ) is specified in block matrix form as 14 The procedure for deriving Eq. (2) directly from the behaviour of individual droplets described by Eqs. (1) is outlined in Appendix A, along with the implications of the physical assumptions involved. The physical interpretation of is the local elemental deformation along a trajectory with respect to its initial state at ( 0 , 0 , ). This is equivalent to the Eulerian-Lagrangian transformation that describes the influence of variables transported by droplets on the local Eulerian field, meaning that is able to provide information about the phase space deformation of the droplet field. The blocks of the Jacobian given by Eq. (3) represent this deformation in the associated part of phase space; is the spatial deformation given by the standard FLA, is the deformation in radial space, and the remaining blocks provide the corresponding cross components. The evolution equations for each block can be obtained by taking partial derivatives of the governing equations (1) with respect to ( 0 , 0 ), yielding the The general form of the initial conditions at time 0 pertaining to Eqs. (4) are where is the identity matrix. The system (4) then describes the evolution of the blocks of whilst accounting for the coupling that exists between the different blocks. Note that this coupling only occurs between blocks in the same column of the Jacobian in Eq. (3), with the different columns being associated with independent systems of equations. This formulation can also be extended to provide a more complete description of the droplet thermophysical behaviour by including temperature and mass as phase space variables instead of only the droplet radius. The initial conditions (5) are found by applying the Jacobian as defined in Eq. (3) to the initial conditions associated with the governing equations (1). It is assumed here that the initial droplet position 0 and velocity 0 are independent of radius 0 , and further that the initial droplet radius 0 is also independent of position 0 . Following 5 , however, it is noted that the initial droplet velocity 0 varies with position 0 depending upon both the nature of the carrier flow and how droplets are introduced into the flow, and identification of the correct initial condition for 0 ∕ 0 is crucial to obtaining the correct behaviour for the evolution of .
For the standard FLA, in which the droplet radius is not considered, the probability density ( , , ) reduces to the number density ( , ), which is a function of only physical space . For the monodisperse cases in Section 4.1 the analysis is accordingly focused upon ( , ), whilst the polydisperse flows in Section 4.2 consider the more general case of ( , , ).

Formulation for simplified physical models
Whilst the gFLA formulation in Section 2.1 is applicable to general equations of motion and evaporation for spherical droplets modelled as point masses in a dilute suspension, by further assuming that the carrier flow density is much less than that of the dispersed phase, the applicability is then restricted to gas-droplet flows. Making the additional assumption of a low droplet Reynolds number enables the droplet momentum to be modelled using a linear drag law. For a simplified physical model of evaporation, all heat at the droplet surface is taken to be spent on evaporation, then the droplet evaporation rate = ( , ) is only dependent on the radius , and independent of position and velocity . Under these conditions, the associated nondimensional forms of and from Eqs. (1) are given by where * 0 is a reference Stokes number corresponding to droplets of characteristic radius * 0 , ( , ) is the carrier flow velocity and is the rate of change of the droplet surface area. In this case the equations of evolution for system (4) simplify tö where ( ) = ( ( ), ) and ( )∕ = ( ( ), )∕ respectively denote that the fluid velocity and fluid velocity gradient are evaluated along the droplet trajectories. The initial conditions for the system (7) remain the same as those given by Eqs. (5).

Computational advantage
The most salient advantage of using the FLA framework to compute the number density field is the increase in computational efficiency over conventional box-counting methods which can be realised, principally due to effective use of the information provided by the Eulerian-Lagrangian transformation. This aspect of the method has been focused upon in several works, and stems from the observation that an accuracy of < 0.1% in calculation of the Eulerian number density field is only achieved using box-counting methods when (10 4 ) droplets are present in each Eulerian cell within a simulation 5,12 . In contrast, due to the ability of the FLA to provide the number density along trajectories, and the facet of the Eulerian-Lagrangian transformation that trajectory data can be meaningfully extrapolated onto the local spatial region, the FLA only requires one trajectory per Eulerian cell to achieve the same level of accuracy 12 . This advantage means that despite the additional computation of the Jacobian as it evolves along each trajectory according to Eqs. (4), the FLA is still approximately 20 times more efficient than boxcounting methods for two-dimensional simulations 5,11 . Another analysis found that a box-counting approach for number density requires ∼ 10 6 trajectories to be computed, whereas the FLA only needs ∼ 10 3 trajectories to obtain the same level of detail, however this is offset by the additional expense of calculating the Jacobian along trajectories which entails a fortyfold increase in computational cost for three-dimensional simulations, and also by linear interpolation of the trajectory number density data onto the Eulerian grid which takes almost as long as the numerical integration along trajectories 15 . Therefore, despite providing a 10 3 saving on the number of trajectories computed when compared against the box-counting approach, the overall efficiency improvement of the FLA for calculation of the Eulerian number density field is reduced to around tenfold. It is also noted that in implementations of the FLA, neglecting droplet collisions in the dilute phase limit results in the independence of trajectories, which further enables effective parallelisation and computational efficiency 16 .

Numerical considerations
The gFLA formulation in Section 2.1 is valid for the regime of dilute droplet flows, and makes the important assumption that the droplet field is continuous between trajectories and exhibits smooth spatial variation. Such an assumption is, however, at odds with the multi-valued nature that characterises the trajectory crossing experienced by inertial droplets. The strength of the standard FLA formulation is that it is able to retain this detail that is inherent to individual trajectories within its description of the spatial droplet field, however this is manifested in the elemental volume of the Eulerian-Lagrangian transformation, given by det( ( 0 , )), becoming zero at the point of trajectory crossings 5 . By virtue of Eq. (2), the droplet number density is then singular at these points. This phenomenon is in fact necessary to maintain consistency with the assumption of continuity for the dispersed phase, and is simply how the FLA framework incorporates the information of crossing trajectories into the concept of a smoothly defined field. It has further been demonstrated in previous work that such singularities in number density are integrable 17 , meaning that the local Eulerian number density field within a defined region remains finite 12 .
Notwithstanding the fact that the FLA presents a theoretically sound approach for number density calculation, in practice the issue lies in how to accumulate the multi-valued number density contributions along trajectories into an Eulerian field representation in a physically correct manner. An interpolation procedure is only valid within a region of the droplet field which is single-valued, meaning that careful treatment of the multi-valued regions is paramount to ensuring that the calculation of an Eulerian number density field is meaningful. Previous works have addressed this by using the fact that the sign of the Jacobian determinant changes every time a droplet crosses the trajectory of another droplet, effectively providing a means of keeping track of which layer of the droplet field a given trajectory is in. Since each layer of the droplet field is single-valued with nonintersecting trajectories, interpolation can be used to calculate the Eulerian number density within each individual layer, and then due to the number density field being additive, the contributions from each layer at a given point are summed together to obtain the total number density 12,15,18 . This requires that droplets are indexed in such a way that a distinction is made every time a fold is crossed, with the most straightforward means of doing this being to keep count of the number of times this occurs for each droplet, then all droplets with the same count index form one layer of the droplet field 18 . Such a procedure enables the reproduction of a number density field which although single-valued, is able to contain the detail of different layers within the droplet field; a feature which is difficult to replicate in Eulerian-based simulations 4 . The drawback of indexing droplets in this manner is that sufficiently many need to be present within each layer of the droplet field for the interpolation methods to be well defined and able to work correctly. In practice this is only an issue when trajectories initially cross and there are few droplets within the newly created layer of the droplet field, however this still presents a situation which requires appropriate treatment to ensure that the contribution of these droplets is accounted for in the number density calculation. Notwithstanding this, the physical accuracy which can be achieved through the use of a Lagrangian approach highlights the potential ability of the FLA to accurately simulate industrially relevant systems.

INTERPOLATION OF THE NUMBER DENSITY FIELD
The primary point of contention in developing useful computational tools which make use of the FLA is that the number density provided by the method is along trajectories, whereas in many numerical methods and applied contexts it is most often the Eulerian field information which is required. In this sense, the scattered number density data which the FLA produces requires the use of an appropriate interpolation procedure for accumulation onto an Eulerian grid. It has been noted that simple linear interpolation is sufficient for a steady-state flow configuration, however for transient flows, that contain vortices, a more comprehensive accumulation algorithm is required 16 . This is a subject which has received surprisingly little attention to date, yet has the potential to open up the FLA methodology to a wider and more general class of flows.

Existing procedures for number density calculation
The classical box-counting approach to calculating the number density field in dispersed multiphase flows simply involves the accumulation of mass contributions from all the droplets within an Eulerian cell, then weighting them by the cell volume. This is also referred to as the nearest neighbour method 19 , and can be interpreted as reassigning the contribution from a droplet to act at the centre of the cell in which it is located. Such an approach is associated with a discontinuous droplet weighting function, since it takes no account of how far a droplet is located from the cell centre, meaning that this information is lost in the accumulation process, and resulting in the generation of artificial noise. It is this noise which can cause a high variation in the computed number density field, and necessitates the requirement for such a high number of droplets using the box-counting approach in order to produce a stable value for the number density.
An improvement to the box-counting approach is achieved by considering a droplet weighting function with 0 continuity which is able to account for the location of a droplet relative to the Eulerian grid on which the accumulation of contributions is made. One procedure that utilises such a methodology is the Cloud-In-Cell (CIC) approach 20 , in which the weighting function takes the form of a triangular-shaped kernel. This is equivalent to assigning contributions from a droplet to the corner nodes of the Eulerian cell in which it resides by using linear interpolation, which importantly ensures that the constraints of continuity and momentum conservation are respected by the accumulation procedure, whilst also maintaining numerical stability. The CIC approach is a widely adopted method of accumulating contributions in dispersed multiphase flow simulations due to its simplicity of implementation, and, owing to the fact that the weighting function retains 0 continuity of the droplet field, requires fewer droplets than the nearest neighbour method to achieve a stable result for the number density field on an identical Eulerian grid. The domain of influence for the droplet weighting function for a Cloud-In-Cell approach is, however, naturally limited to the Eulerian cell in which the droplet is located. This means that many droplets per cell are still required to achieve statistical convergence of the number density field, and the computational expense associated with this can be prohibitive within simulations of industrial scale processes.
Within the broad class of meshfree methods which exist for interpolation of scattered data, kernel-based methods have seen a variety of applications, in particular for fluid dynamics in the form of smoothed particle hydrodynamics (SPH) 21 . Whilst the rationale for SPH is well established, the normalisation condition required of the weighting kernel is such that a sufficient number of particles must be inside the kernel for the accumulation procedure to remain valid, which has implications for the simulation of compressible flows in which the spatial distribution of particles is non-uniform. Although the SPH framework has been extended to handle weakly-compressible multiphase flows through the use of adaptive spatial resolution, this has only been shown to be accurate in the case where the density ratio between the two phases is less than 10 22 , whilst typical gas-droplet flows have a much higher density ratio. Consequently the ability to effectively treat the clusters and voids that are characteristic of dispersed multiphase flow using the SPH framework remains limited.
Previous work has highlighted the ability of Voronoï tessellations to calculate the droplet number density field from droplet location data, by recognising that the area of the Voronoï cells is the inverse of the local droplet number density 23 . Notably, it was demonstrated that it is possible to identify regions of clusters and voids within the droplet field through categorisation of the area of Voronoï cells. This methodology enables information about the Lagrangian dynamics of individual droplets to be included within the representation of the Eulerian number density field, meaning that a certain level of detail about the structures in the droplet field can be retained. However, since the influence of a given droplet is taken to be uniform over its associated Voronoï cell, the resulting Eulerian number density field is discontinuous along the Voronoï cell boundaries. In this sense, the construction of the number density field using Voronoï tessellations requires a sufficiently large number of droplets to obtain a smooth result, an aspect shared with the CIC approach. Despite this, Voronoï tessellations have been incorporated into a solver as a full hydrodynamics scheme 24 , and subsequently developed into Voronoï particle hydrodynamics (VPH) 25 . The advantages offered by such a solver are however offset by the additional complexity of the formulation, and need to generate the tessellation for the entire droplet field at every point in time.
An option which mirrors the thinking behind the use of Voronoï tessellations is that of the Delaunay triangulation, with these two procedures being closely related as the dual graphs of each other. This is motivated by the use of Delaunay triangulations in scattered data interpolation, with the triangulation effectively providing a mesh over which commonly employed interpolation procedures can be used between the known values at data points. The desirable property unique to the Delaunay triangulation is that it is constructed to satisfy the empty circumcircle criterion, which ensures maximization of the internal angles of triangles, with the reasoning that interpolation routines will exhibit greater accuracy and stability over more uniformly shaped triangles. These aspects have seen the Delaunay triangulation developed as a means of constructing density fields from scattered data points 26 , and extended to a hydrodynamical method with the Delaunay tessellation field estimator (DTFE) technique 27 . However, the shortcoming of such an approach is that the Delaunay triangulation necessarily changes discontinuously at certain instants due to the empty circumcircle criterion 25 , meaning that the DTFE is not well suited as the basis of numerical solver.

Requirements of a suitable interpolation procedure
The immediate advantage offered by the FLA methodology is that the exact value of the droplet number density field is already known along trajectories before the use of any interpolation procedure, thereby eliminating the noise that is inherent in procedures such as nearest neighbour or CIC interpolation. The challenge then remains to find an appropriate means of interpolating these values that will retain sufficient accuracy in describing the structures of the droplet field. Existing work has demonstrated that linear interpolation between neighbouring trajectories is able to capture the smooth, and often monotonic, variation of the number density field in steady-state flows well, however the considerably more complex structure of the droplet field in transient flows, for which folds occur and multiple layers are inherent, means that such an approach is not applicable 14 . To effectively extend the concept of linear interpolation to transient flows requires the establishment of a more general framework.
Whilst the ability to extend the Delaunay triangulation to points in -dimensional space does makes it a possible method for interpolating number densities obtained using the gFLA onto a mesh that spans both physical and radial space, using a triangulation-based approach to interpolate FLA data has a number of distinct disadvantages. Principally, the computational expense of having to construct a triangulation over all droplets within each layer of the droplet field, when coupled with the subsequent interpolation procedure of the number density data, would likely negate most of the saving offered by computing the number density along trajectories for the fewer droplets required using the FLA. To compound this, triangulation of droplets would not be able to distinguish and properly resolve voids in the droplet field when using linear interpolation within triangles, which would necessitate the use of a higher order interpolation method that is able to capture the spatial gradient of the number density field, and thereby further increase the computational expense. Finally, whilst knowledge of the Jacobian tensor provides information about the Eulerian-Lagrangian transformation of trajectory variables onto the local Eulerian field for each droplet, and thereby detail of the structures within the droplet field, a triangulation-based procedure is unable to utilise this information in its reconstruction of the number density field.
The potential drawbacks of triangulation-based methods for accumulating the FLA number density data help to elucidate the desirable properties that a suitable interpolation routine should possess. To maintain consistency with the Lagrangian nature of the FLA methodology, a meshfree method which accumulates contributions without prior knowledge of the spatial relationship between droplets would be most appropriate in terms of numerical efficiency, and realising the optimal computational saving which can be achieved using the FLA. In this sense, the contribution of a given droplet should not be dependent on that of other droplets, and the procedure for the accumulation of droplet contributions should also be valid regardless of the number of droplets that contribute to the Eulerian field at a given point. Such a requirement is further justified in the context of the FLA by considering that since droplet contributions are first accumulated within layers of the droplet field, then interpolating between droplets becomes an issue when a layer contains sufficiently low numbers of droplets, and is not possible in the case when the number of droplets is less than or equal to the dimensionality of the simulation. In particular, it is possible that a certain layer may contain only a single droplet, however it is still desirable that the contribution from this droplet is included in the reconstruction of the Eulerian number density field.
Additionally, an interpolation scheme should be able to account for the fact that the number density field for inertial droplets within transient flows has defined clusters and voids, and in particular that droplets in voids can be sparse. It is therefore desirable that the influence of those droplets that are present in voids is able to extend over the surrounding domain to a greater degree, in order to reflect that there are fewer other droplets in the immediate vicinity from which contributions to the Eulerian droplet field can be accumulated. The nature of this demand is most appropriately met through the use of a meshfree method in which a degree of control over the domain of influence that individual droplet contributions have can be exercised.

Kernel regression
Within the context of accumulating the number density data obtained from the FLA, the problem of capturing clusters and voids in the droplet field at a competitive computational cost is addressed here through the means of kernel regression. The motivation behind this choice is that, as a form of statistical learning, the averaged nature of such an approach enables it to effectively handle spatial locations at which only a single data point makes a contribution. This is seen by considering the form of the Nadaraya-Watson estimator 19 in which ( , ) is the instantaneous number density along the trajectory associated with droplet at time as obtained from the FLA, ( , ) is the Eulerian number density field, ( , ) is a kernel which provides a weighting for the contribution of the droplet to the accumulation point , and is the number of droplets for which the range of support of the associated kernel includes the point . In Eq. (8), the numerator provides the weighted sum of the Lagrangian number density contributions ( , ) at , whilst the denominator is equal to the sum of the weighting factors given by the kernels that contribute at . Consequently, in the accumulation procedure for ( , ) it is seen that the contributions from ( , ) are effectively weighted by weights defined by In this sense, the Nadaraya-Watson estimator (8) can be considered as an expansion in renormalised radial basis functions 19 , and it is the appearance of the denominator in Eq. (9) which enables a representative value for ( , ) to be obtained even in the case when = 1. This circumvents the requirement for a specific number of droplets to be within the kernel in order to satisfy its normalisation condition, albeit at the introduction of a statistical error arising due to the low sample size. Nonetheless, for areas of extreme sparsity in inertial droplet fields, this provides an appropriate means by which to infer the behaviour of ( , ) without having to resort to using an exhaustive number of droplets in simulations. For specification of the kernel, a range of functions including Gaussian, Epanechnikov, and tricube kernels are commonly used. In the present work, a Gaussian form is chosen due to its possession of ∞ continuity and convenient geometrical interpretation.
In the one-dimensional case, this is specified by where ℎ is the smoothing length of the kernel. In this form only the Euclidean distance between the droplet position and accumulation point is used to determine the relative contribution of droplets, resulting in a spherical kernel. A more generalised approach can be taken by using the concept of structured kernels, in which distinct smoothing lengths along arbitrary axes of alignment can be defined 19 . For the specific case of a multivariate Gaussian form, such a kernel is given by where is the bandwidth matrix that contains information about the smoothing lengths in different directions. The particular choice of = ℎ 2 corresponds to a spherical kernel, in which case Eq. (11) accordingly reduces to Eq. (10). Control over the kernel in simulations is specified solely by the choice of the smoothing length for spherical kernels, or the bandwidth matrix for structured kernels, and this is dependent on the specific context in which the method is being applied. In general, however, it is important that the smoothing length(s) are large enough to create a field representation which varies smoothly in space, yet not so large that excessive smoothing is applied, in which case detail from the original data will be lost. Therefore, a crucial aspect in the application of kernel regression is to ensure that care is taken in the selection of an appropriate smoothing length for a given situation.

Utilisation of the Eulerian-Lagrangian transformation tensor
The main strength of using kernel regression for accumulation of the FLA number density data lies within the ability to use information from the FLA to specify the smoothing length ℎ or bandwidth matrix of the kernels, in order to gain a finer degree of control over the reconstruction of the number density field than is offered by a constant-sized kernel. The rationale for this procedure is in alignment with the more general framework of metric learning methods for kernel regression, in which the kernel is adaptively defined according to the local behaviour of the interpolation data 28,29 . In the case of the standard FLA formulation this can be accomplished by using the Eulerian-Lagrangian transformation tensor to specify the kernel size and shape associated with each droplet, which enables the highly varying sparsity that is observed in the droplet number density field of unsteady flows to be accounted for. Such a procedure is consistent with the meshfree nature of kernel methods, since it respects the individuality of Lagrangian trajectories by having a distinct bandwidth matrix for each droplet rather than each Eulerian position . Furthermore, using the Eulerian-Lagrangian transformation to determine the bandwidth matrix is also theoretically sound, as it makes use of the physical interpretation of in describing how the influence of a given droplet is exerted locally upon the surrounding domain, and accordingly specifies the range and shape of support for the kernel that is used to accumulate the number density contribution from that droplet to the associated Eulerian field representation. The physical correctness of this can be inferred directly from the FLA using Eq. (2), by considering that if the number density associated with an individual droplet is high, the domain over which it acts must be reduced to conserve the mass contribution of that droplet along its trajectory. The necessary information about the domain size is seen to be provided by the Jacobian determinant |det( )|, and directly associating this with the domain of support of a kernel-based method then naturally incorporates this physical information within the numerical accumulation procedure for constructing the Eulerian number density field. Additionally, specifying the kernel using is equivalent to interpreting the smoothing length(s) as a representative lengthscale(s) for a given kernel. This is consistent with the introduction of a lengthscale within the definition of number density, a concept which has been emphasised in previous work as being necessary to provide a meaningful representation of the number density field 30,31 .
Using the Eulerian-Lagrangian transformation tensor to determine the range and shape of support of the kernel for a droplet also provides a way of mitigating against the occurrence of singularities in the Lagrangian number density data that the FLA provides. This is because whenever a singularity in number density occurs, the corresponding value of det( ) passes smoothly through zero, and this is associated with the domain over which the local Eulerian-Lagrangian transformation applies also reducing to zero. Therefore using to specify the kernel at this point will result in a kernel with a zero range of support, i.e. mathematically equivalent to the Dirac delta function. Since the Eulerian number density field is constructed over a discrete grid with finite grid spacing, once the range of support of a kernel becomes small enough in the case when |det( )| is sufficiently small, the contribution from the droplet at that point will fall below the size of an Eulerian cell and no longer extend to cover any grid points, and will therefore not be included in the Eulerian field representation. Thus specification of the kernel in accordance with offers a means of effectively filtering out or limiting the propagation of the non-physical contributions associated with the high number density values that are inherent to the FLA, and thereby maintaining a more realistic description of the number density field in regions of the droplet field that contain crossing trajectories.
In practice, such a procedure can be applied in the cases of both spherical and structured kernels. For a spherical kernel, the smoothing length ℎ is determined by setting the size of the spherical domain of the kernel to be equal to the elemental volume deformation |det( )| associated with a droplet along its trajectory. For a flow configuration with a spatial domain of ℝ , this determines the smoothing length for the kernel associated with a droplet at a given time as where ℎ 0 is the initial smoothing length associated with the droplet at time 0 and position 0 , and is the number of spatial dimensions in the system under consideration. Using Eq. (12), the smoothing length ℎ( 0 , ) is automatically updated at each timestep, thereby adjusting the size of the kernel directly in accordance with the value of |det( )| at that time. This methodology is consistent with the concept of a variable smoothing length in SPH, where the simplest approach to updating ℎ utilises the number density ( , ) such that 32 For monodisperse droplets, it is seen that substitution of the Lagrangian continuity equation (2) into Eq. (13) demonstrates the equivalence between Eqs. (12) and (13) as procedures for adaptively scaling ℎ( 0 , ), which highlights the rational physical basis for Eq. (12).
In the case of a structured kernel, a positive semi-definite bandwidth matrix is constructed by taking It is appropriate to use the initial one-dimensional smoothing length ℎ 0 within the definition of this structured kernel due to the initial condition on the Eulerian-Lagrangian transformation tensor of ( 0 , 0 ) = in Eqs. (5), from which it follows by Eq. (14) that the initial bandwidth matrix 0 = ( 0 , 0 ) = ℎ 2 0 , and is therefore a spherical kernel. The bandwidth matrix ( 0 , ) can then subsequently be updated at each timestep using Eq. (14), adjusting the size, shape, and orientation of the kernel directly in accordance with the Eulerian-Lagrangian transformation tensor along a trajectory at that time. Specification by means of Eq. (14) results in an ellipsoidal kernel which represents the deformation of the initial spherical kernel at that point along a trajectory, with the lengths of the semi-axes and angles of rotation from the Cartesian axes determined by the components of at that point. Importantly, this procedure can be physically interpreted as specification of the shape and size of the range of support for the kernel associated with an individual droplet in accordance with the spatial structures of the Eulerian number density field. For instance, in regions of high droplet number density where droplets cluster in elongated structures, the ellipsoids associated with the kernels for these droplets will be aligned along these structures, and this ensures that the number density values along trajectories are projected only onto the areas of the Eulerian number density field which have similar physical behaviour in terms of droplet clustering. Therefore using the Eulerian-Lagrangian transformation tensor to specify the kernel provides a data-driven approach to the accumulation of droplet contributions in a manner which is physically consistent with the droplet number density field, and requires no further assumptions to be made beyond specification of the initial smoothing length ℎ 0 . A demonstration that specification of by Eq. (14) defines a kernel for which the domain size varies proportionately to |det ( )| is provided in Appendix B.

Extension of the phase space to include droplet radius
The kernel regression procedure can straightforwardly be extended to include the droplet radius within the phase space coordinate vector, enabling the droplet size distribution to be determined for polydisperse flows using the gFLA by means of Eq. (8), where now instead of ( , ) it is ( , , ) that is being reconstructed. In this case, it is inappropriate to use a spherical kernel, since the scales of the domains for which physical space and radial space are relevant will be different. For the same reason, and also due to the higher dimensionality meaning that the same number of droplets will be more sparsely distributed in ( , ) space, consideration of the structures in the droplet probability density field ( , , ) is no longer as important, and it is sufficient to vary the kernel size based on just the magnitude of the phase space Jacobian determinant det( ( 0 , 0 , )). In this case a structured kernel of the same form as in Eq. (11) is used, however it is now written in terms of the phase space coordinate = ( , ) and phase space trajectories = ( , ) as with the bandwidth matrix now being defined by where ℎ 0 and ℎ 0 are the initial smoothing lengths in physical and radial space respectively. Whilst the resulting kernel is still ellipsoidal, it remains aligned with the Cartesian axes rather than rotating to align with any structures in the droplet number density field. Furthermore, the kernel remains spherical across all spatial dimensions , with separate control of ℎ 0 and ℎ 0 then allowing for the range of support for the kernel in physical and radial space to be determined independently. In this manner the contributions from droplets can be appropriately accumulated to produce the Eulerian probability density field in ( , ) space, and thereby a description of the droplet size distribution at each point in space.

Numerical implementation
In practice, a Gaussian kernel has an infinite range of support with non-zero contributions at every Eulerian grid point in the domain, although these will quickly become negligible at a certain distance away from a droplet. Since kernel regression normalises the resultant Eulerian number density field using only those droplets which contribute at a given point, it is computationally favourable to limit the extent of the range of support for droplets by imposing an artificial range of compact support. This can be realised in the general case of the structured multivariate Gaussian kernel (15) by using the Mahalanobis distance associated with a given kernel to determine which Eulerian grid points will receive non-negligible contributions from that kernel. Specifically, employing the often used 3-sigma rule of the Gaussian distribution, which states that 99.8% of the contributions from a given droplet lie within three smoothing lengths of its position, the condition for Eulerian grid points to receive a contribution from that droplet is √ This procedure then automatically accounts for the shape and orientation of the kernel, and cuts off contributions beyond the isocontour of the kernel at a distance of three smoothing lengths from the droplet position. The number of gridpoints that a given kernel contributes to is then drastically reduced from the full domain, significantly enhancing the computational economy of the kernel regression process. The same principle can also be applied for the gFLA by replacing and in Eq. (17) with and respectively.
Having a compact support for a kernel is advantageous when it comes to accumulating contributions from different droplets onto the Eulerian field. Since kernel regression is a meshfree method, and the kernel is specified uniquely for each individual droplet, it is convenient to consider the extent of all the contributions made by a single droplet onto the Eulerian gridpoints. As the kernel possesses a compact support given by Eq. (17), the subset of gridpoints which receive a non-zero contribution from a given droplet can then be determined. In the case of a uniform Cartesian grid, it is judicious to use the minimum bounding box which encloses the region defined by the Mahalanobis distance using Eq. (17). This can be realised for the multivariate Gaussian kernel (15), since the maximum extent of the range of compact support in the th coordinate direction is given by in which summation over is not implied, and the coefficient of 3 arises due to invoking the 3-sigma rule in defining the compact support. In this manner, the subset of the domain to which a single droplet makes contributions at a given point in time can be easily determined from the uniquely defined kernel associated with that particular droplet, enabling efficient accumulation of these values onto the Eulerian field. This is particularly simple as the use of a droplet search algorithm is not required, and takes advantage of the Lagrangian nature of kernel regression along with the fact that the Eulerian gridpoints are uniformly spaced. For more general or dynamically generated grids a further procedure is needed to determine the gridpoints which lie within the Mahalanobis distance of a droplet, adding to the computational expense of the interpolation procedure. Kernel regression is able to construct a representative Eulerian number density field that respects conservation of droplet phase properties due to the manner in which the accumulation from individual droplets is performed. Specifically, the weights for the Nadaraya-Watson estimator defined by Eq. (9) normalise the droplet contributions at a given gridpoint regardless of the number of droplets which contain that gridpoint within their compact support, meaning that the influence of a given droplet on the accumulated number density field varies. This is exemplified for droplets with a sufficiently high number density, in which case the domain of the associated kernel will become correspondingly small and not extend to any of the neighbouring gridpoints, thus effectively filtering the contribution of that droplet from the Eulerian field at that instant in time. Despite this, kernel regression is still able to produce a physically consistent Eulerian field, and in particular is well suited to pairing with FLA number density data as it provides an estimator for the Eulerian number density field that is independent of the number of sampled trajectories. This is in keeping with the ethos of the FLA, in that by representing the dispersed phase as a continuum, only a subset of the trajectories required by a conventional accumulation procedure are needed to build an accurate description of the number density field.
For the case when the full shape and size of the kernel are specified using by means of Eq. (14), an additional consideration must be made when droplets are in the vicinity of the envelope of folds in the droplet phase and the number density is almost singular. Due to |det( )| becoming zero when a trajectory crosses the envelope of folds, the shape of the local deformation associated with becomes elongated as the droplet approaches the envelope, and eventually collapses in the dimension perpendicular to the envelope so that it is fully aligned at the point where the droplet crosses the envelope. The consequence of defining the kernel using Eq. (14) is then that as the droplet approaches and moves away from the fold envelope, the kernel becomes excessively elongated far beyond the extent of local structures in the droplet phase, and makes non-physical contributions to the regression procedure at distant gridpoints. It is therefore necessary to impose a restriction on the maximum extent to which a kernel can elongate in order to ensure that the accumulation procedure maintains stability and produces a representative reconstruction of the number density field. This is performed in the present work by simply restricting the kernel shape such that each of the semi-axes of the ellipsoidal domain upon which the kernel is defined cannot be greater than three times the value of the smoothing length ℎ associated with the equivalent spherical kernel at that point, as determined by Eq. (12). In practice, this limit on the elongation factor has been found to represent an acceptable trade off between capturing the anisotropic structures that are present in the droplet number density field, whilst preserving an accurate physical description that does not contain spurious numerical effects. In the event that it is necessary to limit the length of one of these semi-axes, the others are accordingly rescaled so that the value of |det( )| at that point on the given trajectory remains unchanged, thereby enabling droplets in the vicinity of the fold envelope to contribute to the kernel regression procedure in a manner which remains representative of the local structures in the droplet phase.
The important issue of selecting a suitable value for ℎ 0 must be made at the beginning of a simulation, and in general is best specified depending upon the initial spatial distribution of droplets. For a seeding of droplets across an interval over which the Eulerian number density field can be considered constant, it is convenient to take ℎ 0 as the same for all droplets, and the simplest way of setting this value is then to define it as the average inter-droplet spacing across the interval, denoted Δ 0 . Since ℎ 0 is equivalent to the standard deviation in the one-dimensional Gaussian kernel (10), this then ensures that the compact support defined using the 3-sigma rule provides sufficient coverage of the computational domain for the given droplet seeding. It should be noted that the specification of ℎ 0 = Δ 0 is only a guideline however, and whilst this is used across most of the cases in Section 4, some variation may be required in order to obtain either higher fidelity or smoothness of the Eulerian number density field as desired. In practice the range of permissible values for the initial smoothing length of a Gaussian kernel with respect to the average inter-droplet spacing is Δ 0 ∕3 ≪ ℎ 0 ≪ 2Δ 0 , where the lower bound is the minimum size at which kernels with compact support are able to effectively provide coverage of the computational domain, and the upper bound represents the highest degree of smoothing which might reasonably be needed. Furthermore, in the case of an expanding spray in which droplets are initially injected through a narrow inlet but then spread to span a far wider region, the initial spatial distribution of droplets may no longer be an adequate metric for basing ℎ 0 upon, and a larger value will be necessary in order to achieve sufficient coverage of the computational domain with the given droplet seeding.
In terms of the gFLA, ℎ 0 is defined in the same way as ℎ 0 , whilst the radial space smoothing length ℎ 0 is specified in a similar manner in terms of the initial inter-droplet size spacing, denoted Δ 0 . The range of validity for the radial space smoothing length differs compared to ℎ 0 in practice due to the less pronounced occurrence of coherent structures in the Eulerian droplet probability density field , and an appropriate set of values is given by 0.5 Δ 0 ≪ ℎ 0 ≪ Δ 0 for a Gaussian kernel with compact support. Analogously to ℎ 0 , the lower bound provides greater resolution of the droplet size distribution in the Eulerian probability density field, whilst the upper bound offers a higher degree of smoothing across the droplet sizes.

Monodisperse droplets
It is instructive to first focus upon application of the kernel regression procedure outlined in Section 3 to monodisperse droplets for which evaporation effects are not considered, and the extension of the droplet distribution into radial space to account for the droplet size therefore does not have to be included. In the following, a range of flow configurations of varying dimensionality and complexity are used to illustrate the performance of the methodology.

One-dimensional unsteady droplet motion
To begin with, a one-dimensional case of an unsteady droplet velocity field is considered 17 , which operates under the assumption that droplets are characterised by a large inertia ( ≫ 1). Droplets are initially released with a non-uniform velocity distribution, and their motion and Jacobian evolve according to the equations = 0 , Eqs. (19) admit the analytical solution The associated number density ( , ) evolves according to Eq. (2) with initial condition ( 0 , ) = 0 , and the corresponding solution in this case is given by From Eq. (20a) it follows that a fold is formed at = 0.5, with the envelope of droplet trajectories along this being In this case the Eulerian number density field can also be obtained as where 0 = ( 0 , 0 ), and in which it is assumed that droplets moving towards the envelope of the trajectories form a separate layer of the fold to those droplets moving away from the envelope, and that these layers belong to different non-interacting continua. The total number density in the region is simply double that in the remainder of the -domain which is occupied by droplets, resulting in the piecewise nature of Eq. (23). Along the fold envelope (22) the droplet number density becomes infinitely high, however this singularity remains integrable 17 . The analytical solution (23) is graphically depicted in -space in Figure (1a), and features a discontinuity along the internal boundary of the multi-valued region given by the second sub-function of Eq. (23). The region of the domain occupied by the droplet phase is reconstructed on a grid at time intervals of Δ = 0.01 with 100 uniformly spaced points of separation Δ at each time, and uses an initial seeding of 101 droplets positioned uniformly on the interval 0 ∈ [0, 1], corresponding to an initial average inter-droplet spacing of Δ 0 = 0.01.
To assess the efficacy of the kernel regression procedure in Section 3, the Eulerian number density field is reconstructed from the Lagrangian number density (21) using the Nadaraya-Watson estimator in Eq. (8). Since this case is one-dimensional, the kernel given in Eq. (10) is utilised, with the smoothing length ℎ specified by Eq. (12). This produces the result shown in Figure (1b). For comparison purposes, it is appropriate to consider the relative error between kernel regression and the analytical solution, and this is presented in Figure (1c). It is seen that the relative error in the kernel regression is generally below 10 −2 throughout the -domain, except at the fold envelope (22) and along the boundary of the multi-valued region. This more marked error arises due to the smoothing property of kernels, in that the influence of a given droplet is distributed over a small region of the domain, with the consequence that representation of steep gradients and discontinuities in the number density field cannot be exactly captured using this procedure. Therefore it is expected that some additional error will result from the use of kernel regression in reconstructing the number density field, however an appropriate choice of initial smoothing length ℎ 0 can mitigate against this 21 , with a smaller kernel generally being preferable subject to remaining smooth. In this case, the smoothing length is set proportional to the variable grid spacing at each point in time, with the initial value of ℎ 0 = Δ 0 ∕3 being chosen in order to achieve a high degree of fidelity in the reconstructed Eulerian number density field. The overall accuracy of the kernel regression is seen to be good, as can be inferred from examination of the spatial profiles of at selected instances in time, depicted in Figure (1d). The only discernible loss of accuracy is in the vicinity of the discontinuity between the singlevalued and multi-valued regions of Eq. (23) for ≥ 1, as can be seen in the jump in the profiles for = 1.5 and = 2. Moreover, it is seen that whilst remains finite in the vicinity of the fold envelope (22), kernel regression is able to capture the increase in number density well, with the error only becoming greater than 10 −2 along the fold itself.

Two-dimensional fan spray injection in cross-flow
Consider a vertical injection of droplets with velocity magnitude * into a horizontal cross flow with constant velocity = (1, 0). Taking the case of = 1, the droplet and Jacobian equations of evolution, given by Eqs. (1a) and (4a) respectively, arë where the initial conditions for in Eq. (24a) are given by in which the expressions foṙ 12 ( 0 ) anḋ 22 ( 0 ) are derived from 5 The system (24) with initial conditions (25) and (26) It is then possible to numerically evaluate the Eulerian number density field in this case using Eq. (29) and the parametric solutions given in Eq. (28). The droplet field includes a multi-valued region containing separate layers of a fold, with the envelope of this fold occurring along the top edge of the spray, and the other boundary of the multi-valued region presenting an internal discontinuity within the spray. The analytical solution from Eqs. (28) and (29) is graphically depicted as a steady-state distribution in -space in Figure (2a), where the droplet number density field is reconstructed on a uniform Cartesian grid with spacing of Δ = Δ = 0.0025. In this example the value of the parameter used to define the interval in which droplets are injected in Eq. (25) is taken to be = 0.05, with a total of 101 droplets being injected uniformly over ∈ ([− , ], 0) at the start of the simulation, giving a value of Δ 0 = 0.001. Numerical reconstruction of the steady-state droplet field from the Lagrangian number density (29) using the Nadaraya-Watson estimator (8) is shown in Figure (2b). The structured kernel in Eq. (11) is used for this multidimensional case, with the bandwidth matrix being specified by Eq. (14). The relative error of kernel regression is shown in Figure (2c), and as with the one-dimensional case in Section 4.1.1 generally remains below 10 −2 except in the vicinity of the fold envelope, along the discontinuity between the multi-valued and single-valued regions of the spray, and along the edges of the spray. Again, it is the perceptible size of the kernel which causes these more marked errors to arise, and this is especially apparent along the edges of the spray where the kernel extends beyond the region occupied by droplets into the surrounding area. This effect is balanced against the need to keep the kernel large enough to produce a smooth number density field from the sample of trajectories used in the simulation, once again requiring careful choice of the initial smoothing length ℎ 0 . In this instance, the default value of ℎ 0 = Δ 0 is used to achieve this. Notwithstanding this, the overall accuracy of kernel regression can be observed from the profiles of at selected values in Figure (2d), and is seen to generally be good with the exception of the discontinuity along the edge of the multi-valued region in the spray. This is observed in the profile for = 0.5, where kernel regression does not capture the exact location of the discontinuity as a result. Aside from this, the physical behaviour and increases in number density near the fold envelope are well captured.

Flow around a cylinder: steady-state case (Re = 20)
To determine the capabilities of the kernel regression procedure in a context which is more representative of a general engineering flow, the case of a two-dimensional gas-droplet flow around a cylinder is considered. This is a prototypical problem, and exhibits distinct flow regimes depending upon the flow Reynolds number ; specifically a symmetric steady-state flow at low , and a periodic von Kármán vortex street beyond the critical value of at which the transition to unsteadiness occurs. The evolution of the carrier flow is described using the Navier-Stokes equations for an incompressible fluid The cylinder is taken to have radius which is used as the representative lengthscale, and the uniform free-stream velocity at which fluid enters the inlet of the domain is chosen as the associated velocity scale . The Reynolds number is then defined in this work as = ∕ , in contrast to much of the existing literature in which the cylinder diameter is chosen as the characteristic lengthscale. In this context, the critical Reynolds number at which the transition to unsteadiness occurs is = 23.5.
To investigate droplet behaviour in the steady-state regime, the case of = 20 is considered. Since the underlying carrier flow is symmetrical about the cylinder in the direction normal to the flow, the droplet distribution also exhibits this symmetry. Droplets are injected at ∕ = −5 over the interval ∕ ∈ [−4, 4] with the free-stream carrier flow velocity ∕ = (1, 0). In the steady-state cases, a total of 101 droplets are injected over this interval in a square-law profile at the start of the simulation to give an average inter-droplet spacing of Δ 0 = 0.08, with the droplet seeding increasing towards the centreline ∕ = 0. Interaction of the droplets with the cylinder is not included within the simulation, and any droplets which do come into contact with the cylinder are subsequently omitted from the calculation of the number density field at later times.
The droplet trajectories evolve according to a linear drag law, which for the case of monodisperse non-evaporating droplets determines the equations for motion and Jacobian evolution from Eqs. (1a) and (4a) as The associated initial conditions are given by where the initial condition oṅ is due to the uniformity of the droplet phase at the point of injection. The Lagrangian number density ( , ) associated with a droplet is then calculated using Eq. (2) without the appearance of the droplet radius as a parameter.
Eqs. (31) are solved using a customisation of the Lagrangian particle tracking library in OpenFOAM, and can be coupled to the required solver for the carrier flow; in this case pimpleFoam has been used to solve Eqs. (30). The domain size for the simulations is −20 ≤ ∕ ≤ 30 and −20 ≤ ∕ ≤ 20 in a Cartesian coordinate system centred on the cylinder. Further details of the numerical setup, mesh independence, and validation can be found in 14 .
In order to reconstruct the Eulerian droplet number density field using the Nadaraya-Watson estimator (8) in the general multidimensional case, the structured kernel in Eq. (11) is used with the bandwidth matrix specified by Eq. (14). The reconstruction is done on a uniform Cartesian grid with a spacing of Δ ∕ = Δ ∕ = 0.04. This is considered for three different droplet sizes which correspond to = 0.1, 1, and 10, and the number density field reconstructed using kernel regression is displayed in Figures (3a), (3c), and (3e) respectively for each of these cases. The profiles of the number density field at selected values of are also displayed in Figures (3b), (3d), and (3f) for these respective values of .
Both the extent and number density distribution of the droplet field is seen to vary markedly with , with the common feature to all cases being the wake behind the cylinder which is devoid of droplets. For = 0.1 the droplets follow the flow relatively closely, and the two regions of the droplet field formed by droplets travelling above and below the cylinder meet downstream of the cylinder after a distance of ∼ 8 . In contrast, the droplet field at the higher values of remains separated into two distinct regions beyond = 25 . The other key difference between the cases lies within the rate of variation in number density of the droplet field. For = 0.1 the droplet field is largely uniform away from the cylinder, and only experiences rapid variation along the edges of the cylinder wake, whilst = 1.0 and = 10 display a larger region of variation in the number density field, but with the change being progressively more gradual.
The profiles in Figures (3b), (3d), and (3f) provide a means of assessing the efficacy of kernel regression against the exact values of number density obtained from the FLA. This is possible since in the case of a steady-state flow it is permissible to interpolate the values of number density along trajectories between time points 14 , and therefore obtain an accurate descriptor of the number density field along given cross-section profiles independently of kernel regression. Owing to the steady-state behaviour of the flow, the number density varies smoothly without the presence of folds and also largely monotonically, and these characteristics essentially mean that linear interpolation of the number density values along trajectories onto a chosen profile provides an exact solution against which kernel regression can be compared. Of particular note is the case for = 0.1, in which the kernel regression performs well across a large part of the flow, but loses accuracy in the region of rapid variation along the edge of the cylinder wake, as depicted in Figure (3b). This is due to the perceptible size of the kernel causing the regression procedure to introduce a certain level of smoothing into the result, and therefore limiting the level of variation in number density which can be accurately captured. However, this is balanced against the kernel needing to be large enough to provide a smooth representation of the number density field for a given initial droplet seeding within a simulation, and as with the previous examples emphasises the importance of judicious selection of the initial smoothing length ℎ 0 . In these cases the default value of ℎ 0 = Δ 0 is selected to provide a reasonable coverage of the computational domain for the chosen initial seeding of 101 droplets which is used. Increasing the initial droplet seeding would enable a smaller value of ℎ 0 to smoothly reconstruct the number density field, which would in turn provide greater accuracy in the regions of rapid variation at lower . Nonetheless, at higher the more gradual variation in number density for the different profiles is successfully accounted for by kernel regression across the entire droplet phase, and the procedure provides a high level of accuracy as demonstrated in Figures (3d) and (3f).
An illustration of the computational efficiency afforded by kernel regression can be made for this steady state approach by comparing to the number density field obtained using direct trajectory methods. In this instance, the Cloud-In-Cell approach that was outlined in Section 3.1 is used, with the accumulation made upon the same Eulerian grid as used for kernel regression. Since the number density is calculated directly in the CIC approach, a sufficient number of droplets are required in each grid cell to produce a stable result, and this is achieved by a higher initial seeding of droplets across the injection interval ∕ ∈ [ −4, 4]. In contrast to the FLA which uses 101 droplets within this range, the CIC approach requires 10001 uniformly spaced droplets across the interval for the resultant Eulerian number density field to sufficiently converge, and furthermore the injection rate of the droplets also has to be increased by a factor of 10. Thus overall the number of droplet realisations required by the CIC approach is found to be 10 3 times more than that by the FLA, consistent with previous works. The number density fields produced by this procedure are displayed in Figures (4a), (4c), and (4e) for each of = 0.1, 1, and 10 respectively, and it can be observed that the number density distribution is largely similar to that obtained from the FLA in the corresponding cases shown in Figures (3a), (3c), and (3e). The much higher seeding of trajectories needed for the CIC approach does, however, result in some differences to the profile of the wake behind the cylinder, and this is best seen through a direct comparison of the two procedures as illustrated in Figures (4b), (4d), and (4f). This depicts the relative error between the number density fields produced by the kernel regression and CIC approaches, and it is seen that this is largest along the edge of the wake, where the error can exceed 10 −1 . Away from the wake however, the error generally varies between 10 −2 and 10 −1 , which is indicative of the level of accuracy that kernel regression of FLA data is able to achieve using 10 3 times fewer droplet realisations, and the associated computational speedup that comes with this.

Flow around a cylinder: transient case (Re = 100)
For consideration of droplet behaviour in the transient regime, the case of = 100 is used. The underlying carrier flow becomes periodic at this level of unsteadiness, and forms the well-known phenomenon of a von Kármán vortex street. The configuration is identical to that in Section 4.1.3 except that the value of is higher and the initial droplet seeding consists of 81 droplets that are injected over the interval ∕ ∈ [−4, 4] with uniform spacing at each timestep. This corresponds to an initial average interdroplet spacing of Δ 0 = 0.1, however the initial smoothing length is kept as ℎ 0 = 0.08 as for the steady-state case, in order to achieve better resolution of the transient structures in the droplet number density field. In practice, this value of ℎ 0 = 0.8 Δ 0 also represents the lower size limit of initial smoothing length that will result in a smooth number density field for transient configurations, and as such has been found to be the optimum value for achieving this trade-off in these cases. As before, the carrier flow evolves according to Eqs. (30) and the droplets and Jacobian are governed by Eqs. (31). The three distinct droplet sizes corresponding to = 0.1, 1, and 10 are again considered, with the number density field reconstructed using kernel regression at = 40 displayed in Figures (5a), (5c), and (5e) respectively for each case, and the associated profiles of the number density field at selected values of given in Figures (5b), (5d), and (5f).
The droplet behaviour is seen to reflect that of the carrier flow, with clear build-ups and voids in the number density field that are directly influenced by the structure of the vortices that form in the wake of the cylinder. The crucial feature of the dispersed phase behaviour is the effect of different levels of droplet inertia on the number density field at the different values of . For = 0.1 the voids in the droplet number density field are closely aligned with the location of vortices in the carrier flow due to the relatively low droplet inertia, as observed in Figure (5a). This is consistent with Maxey's centrifuging mechanism 33 , which argues that low inertia droplets are ejected from areas of high vorticity. Of greater interest are the areas of higher droplet number density, which form along distinct curves between the vortices. The representation of this in Figure (5a) is a demonstration that kernel regression is able to capture this level of detail accurately in the reconstructed number density field, and successfully account for the complex behaviour of the droplet field in transient flows. At higher , the voids of low droplet number density that occur around vortices become much larger, in addition to the appearance of a wake downstream of the cylinder in which no droplets are present, as seen in Figures (5c) and (5e). For = 1 in (5c), the droplet field evolves as alternating regions in which droplets occur followed by voids that contain no droplets, with the region in which droplets are present characterised by a fold line of high number density along the leading edge and an area of lower number density along the trailing edge. For = 10 this behaviour is also observed, with the fold lines exhibiting a higher number density than for = 1, and the trailing edge of the droplet field containing both regions of lower number density and distinct layers of folds as observed from the discontinuities in the droplet field in Figure (5e). The successful capturing of these fold layers shows that kernel regression is able to retain this level of detail from the FLA number density data, and account for this phenomenon at a far lower computational cost than direct trajectory methods 15 .
The profiles of the number density field in Figures (5b), (5d), and (5f) exhibit the different levels of variation in number density that the kernel regression procedure is able to reproduce. For intervals where there is no data for a given profile, the number density is zero as there are no droplets present at that point. It can be seen that as increases, the maximum number density within the flow becomes higher, and the maximum spatial gradient of the number density also increases. In particular, for = 1 and = 10 the profiles in the vicinity of fold lines display rapid variation in number density, however kernel regression is capable of providing a smooth representation of this behaviour, which serves to illustrate the flexibility of this procedure for use in complex flow configurations.

Polydisperse droplets
Generalisation to the case of polydisperse droplets enables investigation of the droplet size distribution within a flow. Extension of the kernel regression framework into radial space is outlined in Section 3.5, and in the following some of the flow configurations from the monodisperse cases in Section 4.1 are used to illustrate the efficacy of this procedure.
To specify the different droplet sizes within simulations, an appropriate probability distribution is defined at the outset as a function of the initial droplet radius 0 = ( 0 ). Following previous work 14 , a log-normal distribution that is the same at all initial locations 0 = ( 0 ) is assumed, in which the mean and standard deviation parameters are chosen to be = 0.16 and = 0.4 respectively. The droplet size 0 in Eq. (33) is nondimensionalised by * 0 , which is a reference droplet radius corresponding to the peak of the distribution. Whilst a log-normal distribution is representative of the spread of droplet sizes in some applications 34 , an alternative distribution, for instance Rosin-Rammler, can easily be specified if it is more physically appropriate.
Of interest in polydisperse droplet flows is not only the distribution of droplet sizes, but also the spatial distribution and average size statistics across all sizes of droplet. These can be determined by considering the moments of , which utilises its interpretation as a probability density field. Specifically, the number density , average radius , and radius variance ′ ′ can be obtained using the definitions The averaged field variables in Eqs. (34) are evaluated within simulations by numerically integrating the probability density obtained from the kernel regression procedure across all droplet sizes in accordance with the various radius weightings used.
Here attention is restricted to just the moments of given in Eqs. (34), however it is possible to define further higher-order statistics of the droplet size distribution as required by the case under consideration, for instance the skewness and kurtosis may be of interest in transient flows. It is also possible to obtain relevant statistics for industrial spray systems, for instance the Sauter mean diameter, from using a similar procedure.

One-dimensional quiescent flow with evaporating droplets
To illustrate the behaviour of evaporating droplets in a simple case, a one-dimensional flow of droplets in quiescent air with uniform temperature is considered, as previously used in 14 . Droplets are initially located at 0 = 0, with velocity 0 = 1, radius 0 ∈ [0, 4] and the probability density ( ( 0 ), ( 0 ), 0 ) as specified by Eq. (33). For the case of * 0 = 1, the droplet motion and evaporation governed by Eqs. (1) becomë and the corresponding Jacobian evolution given in Eqs. (4) becomes with the initial conditions as given in Eqs. (5) along witḣ ( 0 , 0 , 0 ) = 0. The systems (35) and (36) admit analytical solutions along trajectories as detailed in 14 . The probability density along trajectories is calculated using Eq. (2), however to reconstruct the probability density field it is necessary to use multidimensional kernel regression as outlined in Section 3.5, with the corresponding phase space for this case being = ( , ). For this case, the droplet evaporation rate is specified by = 1, and a total of 100 droplet sizes defined uniformly over the range 0 ∈ [0, 4] are injected at the start of the simulation, giving an initial inter-droplet size spacing of Δ 0 ∕ * 0 = 0.04. The initial radial space smoothing length is set as ℎ 0 = 0.5 Δ 0 to achieve a good resolution over the droplet size distribution, whilst as all droplets are started from the same location at = 0 and Δ 0 is consequently not defined in this case, the initial physical space smoothing length is also taken to be ℎ 0 = ℎ 0 ∕ * 0 = 0.5 Δ 0 ∕ * 0 for consistency. The instantaneous phase space probability density field is reconstructed on a uniform Cartesian grid with spacing Δ = 0.04 and Δ ∕ * 0 = 0.04 The complete Eulerian probability density field in ( , ) space is depicted in Figure (6a), and it is seen that kernel regression is able to reproduce a smoothly varying distribution across both physical space and radial space, enabling the full behaviour of the evaporation process to be examined. In Figure (6b), the droplet size distribution profiles at selected locations are shown. The profile at = 0 is the initial probability density distribution specified in Eq. (33), with the profiles at subsequent locations reflecting the decrease in probability density as droplets evaporate whilst they travel through the domain. Figure (6c) depicts the droplet spatial distribution profiles at selected droplet sizes, with the peak probability occurring for = 1 as also determined by Eq. (33). The averaged field variables obtained from the moments of using Eqs. (34) are given in Figure (6d). The number density initially shows a minor increase at small , before droplets begin to fully evaporate and subsequently decreases exponentially. In contrast, the average radius initially decreases at small , before gradually growing and reaching its peak value at ≈ 5.5. This is indicative of the small droplets which form the peak of the distribution (33) evaporating more quickly than larger droplets, causing the average size to increase during this period. Following this, decreases as the larger droplets evaporate. Finally, the variance in droplet radius ′ ′ slowly increases as droplets travel across the domain, reaching a peak value at ≈ 4. This reflects the increased spread of the profiles displayed in Figure (6b) at successive locations, before ′ ′ slowly decreases across the remainder of the domain. Although a simple example, this case serves to demonstrate the wealth of information that kernel regression is able to efficiently extract from the trajectory data in polydisperse flows.

Two-dimensional fan spray injection in cross-flow
This case is an extension of the configuration that was considered in Section 4.1.2 to polydisperse evaporating droplets, and has also been previously studied in 14 . For the case of * 0 = 1, droplet motion and evaporation are governed bÿ where = (1, 0) as before, and the initial conditions for anḋ are identical to those in Eq. (25). The corresponding Jacobian components evolve according tö where the initial conditions are as in Eqs. (5), and those foṙ being the same as in Eq. (26). As in the monodisperse case, the systems (37) and (38) admit analytical solutions along trajectories, and these are given in 14 . Eq. (2) is used to calculate the probability density along trajectories, with multidimensional kernel regression subsequently used to reconstruct the probability density field over the phase space = ( , , ) through the use of Eqs. (15) and (16). In this configuration, the phase space probability density field is reconstructed on a uniform Cartesian grid with spacing Δ = Δ = 0.01 and Δ ∕ * 0 = 0.04, with the droplet evaporation rate being determined by = 1. As in the monodisperse case, = 0.05 is used to define the interval for droplet injection, and in this case at the start of the simulation 100 droplet sizes defined uniformly over the range 0 ∈ [0, 4] are injected at each of 101 locations uniformly spread over the interval 0 ∈ ([− , ], 0). This yields initial inter-droplet spacings of Δ 0 = 0.001 and Δ 0 ∕ * 0 = 0.04 in position and size respectively, however due to the expanding nature of the spray in the spatial domain a larger initial physical space smoothing length of ℎ 0 = 5Δ 0 is required to achieve adequate coverage of the droplet field, whilst the default initial radial space smoothing length of ℎ 0 = Δ 0 is used.
For the case of a multidimensional polydisperse droplet flow this enables the collection of a wealth of information about the steady-state droplet distribution across the spatial dimensions and range of droplet sizes, and Figures 7 and 8 are used to illustrate this, with the variation in droplet size distribution in one spatial direction ( and ) depicted as slices in the other spatial direction ( and respectively). This shows both how the spatial distribution of droplets varies as they travel away from the injection interval, and the range of droplet sizes which are present at a given location, but also importantly the distribution of droplet sizes within that range. It is seen that as droplets travel away from the injection interval in both the and directions, the range of droplet sizes decreases as the larger droplets evaporate, and the droplets become more dispersed in an asymmetrical  profile which is reflective of the cross-flow configuration for the case. Of note is Figure (8a) for the profile at = −1, which illustrates the droplets that initially travel against the flow in the direction from the injection interval. It is observed that only sufficiently large droplets appear at this location, which is a consequence of them having enough inertia to maintain their initial momentum, in contrast to smaller droplets which are more responsive to the flow and do not reach = −1. This highlights the ability of the kernel regression procedure to provide detailed insight into the evolution of polydisperse droplet flows. Of particular interest within polydisperse droplet flows is the number density obtained using Eq. (34a). This is depicted for this case in Figure (9a), and shows the full extent of the droplet spatial distribution across all droplet sizes, with the number density decreasing as droplets move away from the injection interval and evaporate. Profiles of at selected locations are shown in Figure (9b), and illustrate the rapid decrease in number density close to the injection interval. The distribution for the average droplet radius calculated using Eq. (34b) is shown in Figure ( Figure (9f) show that the highest droplet variance occurs at ≈ 3, whilst the spread of variance in the direction increases with greater values of position. Together this collection of information exemplifies how kernel regression is able to produce stable distributions for the averaged field variables from trajectory data provided by the gFLA, and highlights its suitability for application to more general gas-droplet flows.

Flow around a cylinder: steady-state case (Re = 20)
For investigation into the behaviour of polydisperse droplets in a more general flow, the example of two-dimensional gas-droplet flow around a cylinder that was described in Sections 4.1.3 and 4.1.4 is returned to. The carrier flow is governed by Eqs. (30) as previously, and now 41 different droplet sizes within the range ∈ [0. 1,10] are introduced at each of 101 injection points across the interval ∕ ∈ [ −4, 4]. The probability density is initialised in accordance with the distribution (33). In the following the focus is on the ability of kernel regression to reconstruct the droplet size distribution within the flow, and therefore the use of non-evaporating droplets is maintained. To achieve this, the droplet motion and Jacobian evolution are described as in the monodisperse case by Eqs. (31), and supplemented with the droplet evaporation rate = 0 and equations and associated initial conditions for the other blocks in Eqs. (4) and (5) respectively to yield = , = and = 1. As with the previous polydisperse droplet examples, reconstruction of the droplet probability density field follows the procedure outlined in Section 3.5, with the structured kernel defined as in Eq. (15) and the bandwidth matrix given by Eq. (16). The probability density field is accumulated onto a uniform Cartesian grid with a spatial spacing of Δ ∕ = Δ ∕ = 0.04 as in Section 4.1.3, and radial spacing Δ ∕ * 0 = 0.133. The initial inter-droplet spacings in location and size are given by Δ 0 ∕ = 0.08 and Δ 0 ∕ * 0 = 0.1 respectively, and the initial smoothing lengths set as ℎ 0 = Δ 0 and ℎ 0 = 0.667 Δ 0 accordingly. The kernel regression procedure again enables the detail of the droplet distribution across the spatial dimensions and range of droplet sizes to be obtained, and in this case the variation in droplet size distribution in the direction is shown in Figure 10 as slices in the direction to demonstrate this. Since the flow is steady-state, the distribution behaviour that is observed across the slices varies in a straightforward manner, with the width of the wake behind the cylinder decreasing as the distance from the cylinder at ∕ = 0 is increased. In terms of the droplet size distribution, it can clearly be seen that at a given distance from the cylinder, the width of the wake is less for smaller droplets, agreeing with the behaviour previously observed in Figure  7 for monodisperse droplets at selected values of . Indeed, it is possible to extract the spatial number density field at a given droplet radius from the reconstructed field , and choosing the values of corresponding to = 0.1, 1, and 10 reproduces the distributions displayed in Figures (3a), (3c), and (3e) respectively. Furthermore, at any point in the domain, it can be seen in Figure 10 that there is a clear distribution across the range of droplet sizes, which is reflective of the initial size distribution specified by Eq. (33). The number density is displayed in Figure (11a), and shows the behaviour of the droplet spatial distribution across all droplet sizes. It is seen that the wake region behind the cylinder with no droplets present corresponds to that obtained in Figure  (3a) for the smallest droplet size of = 0.1. Additionally, the number density varies smoothly and in general gradually when moving from the centreline ∕ = 0 to the edges of the domain. This can be seen in the profiles of at selected locations displayed in Figure (11b), and shows the variation in number density at different distances behind the cylinder. The average radius distribution shown in Figure (11c) illustrates that larger droplets accumulate as the droplet field splits into two distinct regions when moving past the cylinder, whilst the region of the droplet field that is downstream of the cylinder wake only contains the smallest droplets. The profiles of in Figure (11d) show that at these selected locations the average droplet size does not vary much within the range of droplet sizes, and is uniformly lower towards the centreline ∕ = 0. Both the distribution and profiles of variance in droplet radius ′ ′ , as depicted in Figures (11e) and (11f) respectively, are qualitatively similar to the behaviour of , with the highest variance observed at the point where the droplet field parts to pass around the cylinder.

Flow around a cylinder: transient case (Re = 100)
For the transient regime of polydisperse droplet flow around a cylinder, = 100 is used as before in Section 4.1.4, and otherwise the configuration is identical to that in Section 4.2.3. In terms of the kernel regression procedure, the probability density field is accumulated onto a uniform Cartesian grid with a spatial spacing of Δ ∕ = Δ ∕ = 0.04 and radial spacing Δ ∕ * 0 = 0.133 as before, with the initial smoothing lengths again specified as ℎ 0 = Δ 0 and ℎ 0 = 0.667 Δ 0 . Again it is primarily the droplet spatial distribution which is of interest, and following the previous section the variation in droplet size distribution in the direction is shown in Figure 12 as slices in the direction at = 30. Whilst the probability density distribution in Figure (12a) displays only minor deviations from being symmetrical at a distance of ∕ = 3 behind the cylinder, at increasing distances this symmetry breaks down as the vortex street becomes established. At a sufficient distance, the probability density distribution at a given slice then exhibits regions with either no droplets or a high probability density depending where in the periodic cycle of the flow the slice is located, for example at ∕ = 15 as displayed in Figure (12e). Due to the transient nature of the flow, the effect of specifying the initial probability density using Eq. (33) is less apparent than in the previous steady-state example, however the highest probability density values are still found among the smaller droplet sizes, with fewer larger droplets occurring. A wake is seen to persist for some distance behind the cylinder for only the largest droplets, but eventually ceases to exist once the periodic flow has become fully established. As with the steady-state case, the spatial number density field at a given droplet radius can be extracted from the reconstructed field , whereby selecting the values of corresponding to = 0.1, 1, and 10 retrieves the distributions in Figures (8a), (8c), and (8e) respectively. Figure (13a) displays the number density at = 30, exhibiting the cumulative effect of all droplet sizes on the droplet spatial distribution. It is observed that all droplet sizes are ejected from the vortices by virtue of their inertia, with clustering occurring along curves which are distinct for different droplet sizes. This is clearly an accumulation of the behaviour previously displayed in Figures (8a), (8c), and (8e) for different droplet sizes in the monodisperse case, and exemplifies the ability of kernel regression to account for this range of behaviour in the process of reconstructing the number density field. The variation in number density at different profiles is displayed in Figure (13b), and it is seen that the behaviour of a given profile depends largely on its location with respect to the periodic cycle of the flow. For example, the number density at ∕ = 9 is largely uniform for this cross-section of the droplet field, however the profile closer to the cylinder at ∕ = 3 shows much higher variation as it passes through one of the vortices at this particular snapshot in time. The distribution and profiles of the average droplet radius at = 30 are shown in Figures (13c) and (13d) respectively, and evidence that larger droplets accumulate along the distinct curves between the vortices, with only smaller droplets existing in the vicinity of the vortices. As with the steady-state case in Figure  11, the distribution and profiles of variance in droplet radius ′ ′ at = 30 and illustrated in Figures (13e) and (13f) respectively are qualitatively similar to the behaviour of in this case.

CONCLUSIONS
This paper has presented a novel methodology for reconstructing the droplet number density field using the probability density along trajectories calculated by the generalised Fully Lagrangian Approach (gFLA; the Fully Lagrangian Approach taking into account the effect of droplet evaporation), and the technique of kernel regression to accumulate the contributions from individual droplets onto an Eulerian grid. This has been applied to a range of steady-state and transient flow configurations for both monodisperse and polydisperse droplets, and it has been demonstrated that the kernel regression procedure can reliably produce a smooth representation of the droplet probability density field and average field variables both in space and across the range of droplet sizes.
This approach has several advantages. The strongest benefit of kernel regression is making use of the meshfree nature of the method to define the kernel using the Jacobian tensor from the gFLA, so that individual droplet contributions are extrapolated to the surrounding Eulerian region in accordance with the local structure of the droplet number density field. This results in a consistent procedure that enables the retention of a high level of detail in the number density field from a relatively small sample of droplets, in contrast to a conventional Cloud-In-Cell approach which employs elementary averaging and therefore requires both more trajectories and a more frequent injection rate to reliably reproduce the number density field. It is demonstrated that kernel regression yields a stable result with 10 3 times fewer droplet realisations than the CIC approach, and is therefore able to achieve the potential reduction in the computational cost required to obtain a smooth yet qualitatively accurate continuum representation for inertial droplet field variables. The informed use of data in such a way is becoming increasingly important, as the storage of datasets becomes impractical for highly resolved simulations due to the amount of information involved. Another consequence of being a meshfree method is that discontinuities in the droplet field are automatically dealt with by kernel regression, such as at the edge of the droplet region, and in this manner arbitrarily complex flows can be treated without requiring prior knowledge of the extent of the domain which is occupied by droplets.
Additionally, varying the size of the kernel in this way means that areas of sparse droplet number density can be effectively dealt with. Particle-based methods such as smoothed particle hydrodynamics require a minimum number of particles to satisfy the normalisation condition of the kernel, limiting their use to weakly compressible flows in which the number density of particles is roughly constant. In contrast, kernel regression satisfies the normalisation condition of the kernel regardless of how many particles are within the compact support, making the approach applicable to voids in which there are relatively few particles. Furthermore, specification of the kernel using the Jacobian provides a means of obtaining a smooth representation for regions containing few droplets, as the low number density associated with droplets in these regions results in the compact support of the kernel being proportionately large, and extending to cover the areas between sparsely distributed droplets.
Computationally, the procedure can be implemented by considering the contributions a given droplet makes to gridpoints within its compact support, rather than having to identify all the droplets which contribute at a given gridpoint. When the reconstruction is performed upon a Cartesian grid this avoids the need for a droplet search algorithm, and is therefore relatively efficient. It is also conceptually trivial to extend the kernel regression procedure to account for the droplet size distribution by extending the dimensionality of the kernel, and reconstructing the probability density field on a higher-dimensional grid.
The representation of the number density field produced by kernel regression is, however, a locally constant estimator, and is unable to predictively extrapolate beyond the range of number densities which are sampled along trajectories. This is a shortcoming that is also inherent in linear interpolation, and means that in areas of rapid variation in number density, kernel regression will not return the exact value of the maximum or minimum local number density unless there is a droplet at that point. Furthermore, by its nature kernel regression is a statistical procedure, and is able to achieve a smooth representation of the droplet field only at the expense of some loss of absolute accuracy. This generally also occurs in regions of rapid variation in number density, with the smoothing characteristic of kernel regression meaning that for a given smoothing length and number of droplets only a certain level of variation can be captured. In practice this is most restrictive at a low droplet inertia, as has been observed in this work for low , whilst the accuracy remains good for higher values of . In general, kernel regression requires far fewer droplets than conventional direct trajectory methods to achieve a smooth Eulerian representation of the droplet phase at a reasonable level of accuracy, however increasing the number of droplet trajectories that are sampled will improve the accuracy if needed.
The stability of kernel regression and its versatility across a variety of flow configurations clearly demonstrate the efficacy of this procedure, and its suitability for upscaling and application to more general spray systems. In particular, the ability to accurately determine the complete droplet size distribution and its associated statistics at any spatial location within the droplet field make this methodology relevant to the simulation of many industrial and environmental systems, in which detailed knowledge of the droplet behaviour is paramount to the control and optimisation of such processes.

ACKNOWLEDGMENTS
The authors are grateful to the UKRI Future Leaders Fellowship (Grant MR/T043326/1) for their financial support, and would also like to acknowledge Professor A. N. Osiptsov for his helpful suggestions.

Financial disclosure
None reported.
Eq. (A9) can be formally solved along the phase space trajectory , and along with the initial condition ( ( 0 ), 0 ) = ( 0 , 0 ) yields the solution ( , ) = ( 0 , 0 ) exp where ′ = ( ′ ), and it is assumed that ( 0 , 0 ) > 0 in accordance with the probability density ( , ) being interpreted as a strictly positive quantity. Re-introducing the interpretations = ( , ), = ( , ) leads to the solution in the familiar physical variables ( , , ) = ( 0 , 0 , 0 ) exp where ′ = ( ′ ) and ′ = ( ′ ). Eq. (A11) is the solution for the probability density ( , , ) as it evolves along trajectories. However, this assumes that droplet trajectories are governed by the velocity averaged Eqs. (A5) through the velocity field ( , , ) and evaporation rate ( , , ). Now consider the Jacobian tensor defined by Then taking the partial derivative ∕ 0 of the phase space trajectory described by Eq. (A6), the evolution of ( 0 , ) is determined bẏ where the third line follows since the trace operator is invariant under cyclic permutations of products of arguments, and the final line is due to the trace of the gradient being equivalent to the divergence. Eq. (A15) can be formally solved along the phase space trajectory , which presents the general solution in which no restriction is made upon the sign of the arbitrary constant . Since is determined from the initial condition on det ( 0 , ) , this allows for the case where det ( 0 , ) is negative to be accounted for in the description provided by Eq. (A16). Applying the general initial condition of det ( , 0 ) = 1 (which follows from the definition of in Eq. (A12) and the initial condition upon in Eq. (A6) at time 0 ) then yields Re-introducing the interpretations = ( , ), = ( , ) leads to the equivalent form in physical variables Comparing the solutions for ( , , ) and det ( 0 , 0 , ) in Eqs. (A11) and (A18) respectively, it is seen that conservation of mass along trajectories in ( , ) space can be expressed as from which the Lagrangian form of the continuity equation expressed in Eq. (2) immediately follows. Note that since this formulation hinges upon use of droplet trajectories described by the velocity field ( , , ), the resultant continuum description inherently assumes the invertibility of ( 0 , 0 , ) at a given point in time, that is det ( 0 , 0 , ) ≠ 0. Physically this corresponds to the single-valuedness of droplet velocities at a given point , which in reality is only representative of fluid tracer droplets in the limit → 0. Consequently, this description is unable to qualitatively account for the crossing trajectories effect of sufficiently inertial droplets at finite , for which occurrences of det ( 0 , 0 , ) = 0 are observed along the envelope of folds in the droplet velocity field, and the probability density ( , , ) becomes singular. Similarly, det ( 0 , 0 , ) becomes negative once the associated trajectory crosses the envelope of a fold. However, these phenomena do not violate the physical correctness of Eq. (2) as a conservation law, and indeed do not preclude using values of det ( 0 , 0 , ) calculated from simulation data of inertial droplet trajectories. This can be understood by considering the geometric meaning of det ( 0 , 0 , ) as the volume scaling factor of the linear transformation described by the Jacobian ( 0 , 0 , ), with the associated sign simply showing whether the transformation preserves or reverses orientation. In the context of calculating ( , , ) the orientation of the elemental volume is not relevant, and therefore using |det ( 0 , 0 , ) | to evaluate the probability density by means of Eq. (2) remains physically consistent in the case that det ( 0 , 0 , ) < 0.

B DEMONSTRATION OF THE KERNEL DOMAIN SIZE VARYING PROPORTIONATELY TO THE JACOBIAN DETERMINANT
In order to account for the high degree of variation in the number density field for inertial droplets, it is desired that reconstruction using a kernel-based method is able to vary the size of the kernel domain of support according to the local droplet number density. Within the FLA methodology, the physical interpretation of the Jacobian as the Eulerian-Lagrangian transformation provides the necessary information regarding the extent of the local Eulerian droplet field over which the influence of an individual droplet acts. It is therefore necessary to specify the kernel such that its domain of support is proportionate to |det ( )|. For the specific choice of the multivariate Gaussian kernel in Eq. (11), it can be shown that the domain size for a compact support is proportional to √ det( ). Then for the positive semi-definite specification of = ℎ 2 0 ⋅ ⊤ in Eq. (14), we have that Thus the domain of support for the kernel is proportionate to |det ( ) | as required, meaning that the kernel is able to adaptively scale in accordance with the local Eulerian-Lagrangian transformation so that the spatial structures of the droplet field, which accompany the variation in number density, are captured.